SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: FLECKENSTEIN, Bernhard 
ENSSER, Armin 

(ii) TITLE OF INVENTION: HUMAN SEMAPHORIN L (H-SEMAL) AND 
CORRESPONDING SEMAPHORINS IN OTHER SPECIES 

(iii) NUMBER OF SEQUENCES: 44 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE : Frommer Lawrence & Haug LLP 

(B) STREET: 745 Fifth Avenue 

(C) CITY: New York 

(D) STATE: New York 

(E) COUNTRY : USA 

(F) ZIP: 10151 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: US NYA 

(B) FILING DATE: 09-JUL-1998 

(C) CLASSIFICATION: 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Lawrence, William F. 

(B) REGISTRATION NUMBER: 28,029 

(C) REFERENCE/DOCKET NUMBER: 514429-3647 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 212-588-0800 

(B) TELEFAX: 212-588-0500 



(2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2636 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 



CGGGGCCACG GGATGACGCC TCCTCCGCCC GGACGTGCCG CCCCCAGCGC ACCGCGCGCC 60 

CGCGTCCCTG GCCCGCCGGC TCGGTTGGGG CTTCCGCTGC GGCTGCGGCT GCTGCTGCTG 120 

CTCTGGGCGG CCGCCGCCTC CGCCCAGGGC CACCTAAGGA GCGGACCCCG CATCTTCGCC 180 

GTCTGGAAAG GCCATGTAGG GCAGGACCGG GTGGACTTTG GCCAGACTGA GCCGCACACG 24 0 

GTGCTTTTCC ACGAGCCAGG CAGCTCCTCT GTGTGGGTGG GAGGACGTGG CAAGGTCTAC 3 00 

CTCTTTGACT TCCCCGAGGG CAAGAACGCA TCTGTGCGCA CGGTGAATAT CGGCTCCACA 360 

AAGGGGTCCT GTCTGGATAA GCGGGACTGC GAGAACTACA TCACTCTCCT GGAGAGGCGG 420 

AGTGAGGGGC TGCTGGCCTG TGGCACCAAC GCCCGGCACC CCAGCTGCTG GAACCTGGTG 4 80 

AATGGCACTG TGGTGCCACT TGGCGAGATG AGAGGCTACG CCCCCTTCAG CCCGGACGAG 540 

AACTCCCTGG TTCTGTTTGA AGGGGACGAG GTGTATTCCA CCATCCGGAA GCAGGAATAC 600 

AATGGGAAGA TCCCTCGGTT CCGCCGCATC CGGGGCGAGA GTGAGCTGTA CACCAGTGAT 660 

O ACTGTCATGC AGAACCCACA GTTCATCAAA GCCACCATCG TGCACCAAGA CCAGGCTTAC 72 0 

m GATGACAAGA TCTACTACTT CTTCCGAGAG GACAATCCTG AC AAGAATC C TGAGGCTCCT 78 0 

CTCAATGTGT CCCGTGTGGC CCAGTTGTGC AGGGGGGACC AGGGTGGGGA AAGTTCACTG 84 0 

TCAGTCTCCA AGTGGAACAC TTTTCTGAAA GCCATGCTGG TATGCAGTGA TGCTGCCACC 900 

^ AACAAGAACT TCAACAGGCT GCAAGACGTC TTCCTGCTCC CTGACCCCAG CGGCCAGTGG 96 0 

AGGGACACCA GGGTCTATGG TGTTTTCTCC AACCCCTGGA ACTACTCAGC CGTCTGTGTG 102 0 

TATTCCCTCG GTGACATTGA CAAGGTCTTC CGTACCTCCT CACTCAAGGG CTACCACTCA 108 0 

AGCCTTCCCA ACCCGCGGCC TGGCAAGTGC CTCCCAGACC AGCAGCCGAT ACCCACAGAG 1140 

ACCTTCCAGG TGGCTGACCG TCACCCAGAG GTGGCGCAGA GGGTGGAGCC CATGGGGCCT 12 00 

CTGAAGACGC CATTGTTCCA CTCTAAATAC CACTACCAGA AAGTGGCCGT TCACCGCATG 1260 

CAAGCCAGCC ACGGGGAGAC CTTTCATGTG CTTTACCTAA CTACAGACAG GGGCACTATC 1320 

CACAAGGTGG TGGAACCGGG GGAGCAGGAG CACAGCTTCG CCTTCAACAT CATGGAGATC 1380 

CAGCCCTTCC GCCGCGCGGC TGCCATCCAG ACCATGTCGC TGGATGCTGA GCGGAGGAAG 144 0 

CTGTATGTGA GCTCCCAGTG GGAGGTGAGC CAGGTGCCCC TGGACCTGTG TGAGGTCTAT 1500 

GGCGGGGGCT GCCACGGTTG CCTCATGTCC CGAGACCCCT ACTGCGGCTG GGACCAGGGC 1560 

CGCTGCATCT CCATCTACAG CTCCGAACGG TCAGTGCTGC AATCCATTAA TCCAGCCGAG 162 0 

C C AC AC AAGG AGTGTCCCAA CCCCAAACCA GACAAGGCCC CACTGCAGAA GGTTTCCCTG 1680 

GCCCCAAACT CTCGCTACTA CCTGAGCTGC CCCATGGAAT CCCGCCACGC CACCTACTCA 174 0 
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TGGCGCCACA AGGAGAACGT GGAGCAGAGC TGCGAACCTG GTCACCAGAG CCCCAACTGC 

ATCCTGTTCA TCGAGAACCT CACGGCGCAG CAGTACGGCC ACTACTTCTG CGAGGCCCAG 

GAGGGCTCCT ACTTCCGCGA GGCTCAGCAC TGGCAGCTGC TGCCCGAGGA CGGCATCATG 

GCCGAGCACC TGCTGGGTCA TGCCTGTGCC CTGGCTGCCT CCCTCTGGCT GGGGGTGCTG 

CCCACACTCA CTCTTGGCTT GCTGGTCCAC TAGGGCCTCC CGAGGCTGGG CATGCCTCAG 

GCTTCTGCAG CCCAGGGCAC TAGAACGTCT CACACTCAGA GCCGGCTGGC CCGGGAGCTC 

CTTGCCTGCC ACTTCTTCCA GGGGACAGAA TAACCCAGTG GAGGATGCCA GGCCTGGAGA 

CGTCCAGCCG CAGGCGGCTG CTGGGCCCCA GGTGGCGCAC GGATGGTGAG GGGCTGAGAA 

TGAGGGCACC GACTGTGAAG CTGGGGCATC GATGACCCAA GACTTTATCT TCTGGAAAAT 

ATTTTTCAGA CTCCTCAAAC TTGACTAAAT GCAGCGATGC TCCCAGCCCA AGAGCCCATG 

GGTCGGGGAG TGGGTTTGGA TAGGAGAGCT GGGACTCCAT CTCGACCCTG GGGCTGAGGC 

CTGAGTCCTT CTGGACTCTT GGTACCCACA TTGCCTCCTT CCCCTCCCTC TCTCATGGCT 

GGGTGGCTGG TGTTCCTGAA GACCCAGGGC TACCCTCTGT CCAGCCCTGT CCTCTGCAGC 

TCCCTCTCTG GTCCTGGGTC CCACAGGACA GCCGCCTTGC ATGTTTATTG AAGGATGTTT 

GCTTTCCGGA CGGAAGGACG GAAAAAGCTC TGAAAAAAAA AAAAAAAAAA AAAAAA 
(2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1195 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 
CGGGGCTGCG GGATGACGCC TCCTCCTCCC GGACGTGCCG CCCCCAGCGC ACCGCGCGCC 
CGCGTCCTCA GCCTGCCGGC TCGGTTCGGG CTCCCGCTGC GGCTGCGGCT TCTGCTGGTG 
TTCTGGGTGG CCGCCGCCTC CGCCCAAGGC CACTCGAGGA GCGGACCCCG CATCTCCGCC 
GTCTGGAAAG GGCAGGACCA TGTGGACTTT AGCCAGCCTG AGCCACACAC CGTGCTTTTC 
CATGAGCCGG GCAGCTTCTC TGTCTGGGTG GGTGGACGTG GCAAGGTCTA CCACTTCAAC 
TTCCCCGAGG GCAAGAATGC CTCTGTGCGC ACGGTGAACA TCGGCTCCAC AAAGGGGTCC 
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Glu Pro Gly Ser Ser Ser Val Trp Val Gly Gly Arg Gly Lys Val Tyr 
85 90 95 

Leu Phe Asp Phe Pro Glu Gly Lys Asn Ala Ser Val Arg Thr Val Asn 
100 105 110 

lie Gly Ser Thr Lys Gly Ser Cys Leu Asp Lys Arg Asp' Cys Glu Asn 
115 120 125 

Tyr lie Thr Leu Leu Glu Arg Arg Ser Glu Gly Leu Leu Ala Cys Gly 
130 135 140 

Thr Asn Ala Arg His Pro Ser Cys Trp Asn Leu Val Asn Gly Thr Val 
145 150 155 160 

Val Pro Leu Gly Glu Met Arg Gly Tyr Ala Pro Phe Ser Pro Asp Glu 
165 170 175 

Asn Ser Leu Val Leu Phe Glu Gly Asp Glu Val Tyr Ser Thr lie Arg 
180 185 190 

Lys Gin Glu Tyr Asn Gly Lys lie Pro Arg Phe Arg Arg lie Arg Gly 
195 200 205 

Glu Ser Glu Leu Tyr Thr Ser Asp Thr Val Met Gin Asn Pro Gin Phe 
210 215 220 

lie Lys Ala Thr lie Val His Gin Asp Gin Ala Tyr Asp Asp Lys lie 
225 230 235 240 

Tyr Tyr Phe Phe Arg Glu Asp Asn Pro Asp Lys Asn Pro Glu Ala Pro 
245 250 255 

Leu Asn Val Ser Arg Val Ala Gin Leu Cys Arg Gly Asp Gin Gly Gly 
260 265 270 

Glu Ser Ser Leu Ser Val Ser Lys Trp Asn Thr Phe Leu Lys Ala Met 
275 280 285 

Leu Val Cys Ser Asp Ala Ala Thr Asn Lys Asn Phe Asn Arg Leu Gin 
290 295 300 

Asp Val Phe Leu Leu Pro Asp Pro Ser Gly Gin Trp Arg Asp Thr Arg 
305 310 315 320 

Val Tyr Gly Val Phe Ser Asn Pro Trp Asn Tyr Ser Ala Val Cys Val 
325 330 335 

Tyr Ser Leu Gly Asp lie Asp Lys Val Phe Arg Thr Ser Ser Leu Lys 
340 345 350 

Gly Tyr His Ser Ser Leu Pro Asn Pro Arg Pro Gly Lys Cys Leu Pro 
355 360 365 



Asp Gin Gin Pro lie Pro Thr Glu Thr Phe Gin Val Ala Asp Arg His 
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Pro Glu Val Ala Gin Arg Val Glu Pro Met Gly Pro Leu Lys Thr Pro 
385 390 395 400 

Leu Phe His Ser Lys Tyr His Tyr Gin Lys Val Ala Val His Arg Met 
405 410 415 

Gin Ala Ser His Gly Glu Thr Phe His Val Leu Tyr Leu Thr Thr Asp 
420 425 430 

Arg Gly Thr He His Lys Val Val Glu Pro Gly Glu Gin Glu His Ser 
435 440 445 

Phe Ala Phe Asn He Met Glu He Gin Pro Phe Arg Arg Ala Ala Ala 
450 455 460 

He Gin Thr Met Ser Leu Asp Ala Glu Arg Arg Lys Leu Tyr Val Ser 
465 470 475 480 

Ser Gin Trp Glu Val Ser Gin Val Pro Leu Asp Leu Cys Glu Val Tyr 
^ 485 490 495 

a* " 

tiD Gly Gly Gly Cys His Gly Cys Leu Met Ser Arg Asp Pro Tyr Cys Gly 

© 500 505 510 

yj 

Iff Trp Asp Gin Gly Arg Cys He Ser He Tyr Ser Ser Glu Arg Ser Val 

P 515 520 525 

SI 

^1 Leu Gin Ser He Asn Pro Ala Glu Pro His Lys Glu Cys Pro Asn Pro 

530 535 540 

«sl Lys Pro Asp Lys Ala Pro Leu Gin Lys Val Ser Leu Ala Pro Asn Ser 

£P 545 550 555 560 

o 

Ql Arg Tyr Tyr Leu Ser Cys Pro Met Glu Ser Arg His Ala Thr Tyr Ser 

565 570 575 

XVSff 

* Trp Arg His Lys Glu Asn Val Glu Gin Ser Cys Glu Pro Gly His Gin 

580 " 585 590 

Ser Pro Asn Cys He Leu Phe He Glu Asn Leu Thr Ala Gin Gin Tyr 
595 600 605 

Gly His Tyr Phe Cys Glu Ala Gin Glu Gly Ser Tyr Phe Arg Glu Ala 
610 615 620 

Gin His Trp Gin Leu Leu Pro Glu Asp Gly He Met Ala Glu His Leu 
625 630 635 640 

Leu Gly His Ala Cys Ala Leu Ala Ala Ser Leu Trp Leu Gly Val Leu 
645 650 655 

Pro Thr Leu Thr Leu Gly Leu Leu Val His 
660 665 



(2) INFORMATION FOR SEQ ID NO : 4 : 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 94 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: n/a 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: amino acid 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 

Met Thr Pro Pro Pro Pro Gly Arg Ala Ala Pro Ser Ala Pro Arg Ala 
15 10 15 

Arg Val Leu Ser Leu Pro Ala Arg Phe Gly Leu Pro Leu Arg Leu Arg 
20 25 30 

Leu Leu Leu Val Phe Trp Val Ala Ala Ala Ser Ala Gin Gly His Ser 
35 40 45 

Arg Ser Gly Pro Arg lie Ser Ala Val Trp Lys Gly Gin Asp His Val 
50 55 60 

Asp Phe Ser Gin Pro Glu Pro His Thr Val Leu Phe His Glu Pro Gly 
65 70 75 80 

Ser Phe Ser Val Trp Val Gly Gly Arg Gly Lys Val Tyr His Phe Asn 
85 90 95 

Phe Pro Glu Gly Lys Asn Ala Ser Val Arg Thr Val Asn lie Gly Ser 
100 105 110 

Thr Lys Gly Ser Cys Gin Asp Lys Gin Asp Cys Gly Asn Tyr lie Thr 
115 120 125 

Leu Leu Glu Arg Arg Gly Asn Gly Leu Leu Val Cys Gly Thr Asn Ala 
130 135 140 

Arg Lys Pro Ser Cys Trp Asn Leu Val Asn Asp Ser Val Val Met Ser 
145 150 155 160 

Leu Gly Glu Met Lys Gly Tyr Ala Pro Phe Ser Pro Asp Glu Asn Ser 
165 170 175 

Leu Val Leu Phe Glu Gly Asp Glu Val Tyr Ser Thr lie Arg Lys Gin 
180 185 190 

Glu Tyr Asn Gly Lys lie Pro Arg Phe Arg Arg lie Arg Gly Glu Ser 
195 200 205 

Glu Leu Tyr Thr Ser Asp Thr Val Met Gin Asn Pro Gin Phe lie Lys 
210 215 220 



Ala Thr lie Val His Gin Asp Gin Ala Tyr Asp Asp Lys lie Tyr Tyr 
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Phe Phe Arg Glu Asp Asn Pro Asp Lys Asn Pro Glu Ala Pro Leu Asn 
245 250 255 

Val Ser Arg Val Ala Gin Leu Cys Arg Gly Asp Gin Gly Gly Glu Ser 
260 265 270 

Ser Leu Ser Val Ser Lys Trp Asn Thr Phe Leu Lys Ala Met Leu Val 
275 280 285 

Cys Ser Asp Ala Ala Thr Asn Arg Asn Phe Asn Arg Leu Gin Asp Val 
290 295 300 

Phe Leu Leu Pro Asp Pro Ser Gly Gin Trp Arg Asp Thr Arg Val Tyr 
305 310 315 320 

Gly Val Phe Ser Asn Pro Trp Asn Tyr Ser Ala Val Cys Val Tyr Ser 
325 330 335 

Leu Gly Asp lie Asp Arg Val Phe Arg Thr Ser Ser Leu Lys Gly Tyr 
340 345 350 

His Met Gly Leu Ser Asn Pro Arg Pro Gly Met Cys Leu Pro Lys Lys 
355 360 365 

Gin Pro lie Pro Thr Glu Thr Phe Gin Val Ala Asp Ser His Pro Glu 
370 375 380 

Val Ala Gin Arg Val Glu Pro Met Gly Pro 
385 390 

(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 
ACTCACTATA GGGCTCGAGC GGC 
(2) INFORMATION FOR SEQ ID NO : 6 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6 
AGCCGCACAC GGTGCTTTTC 
(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7 
GCACAGATGC GTTCTTGCCC 
(2) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 
ACCATAGACC CTGGTGTCCC 
(2) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 



GCAGTGATGC TGCCACCAAC 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10 
CCAGACCATG TCGCTGGATG 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11 
ACATGAGGCA ACCGTGGCAG 
(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12 
CCATCCTAAT ACGACTCACT ATAGGGC 
(2) INFORMATION FOR SEQ ID NO:13: 
(i) SEQUENCE CHARACTERISTICS: 



(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13 
AGGTAGACCT TGCCACGTCC 
(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14 
GAACTTCAAC AGGCTGCAAG ACG 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15 
ATGCTGAGCG GAGGAAGCTG 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16 
CCGCCATACA CCTCACACAG 
(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17 
CTGGAAGCTT TCTGTGGGTA TCGGCTGC 
(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18 
TTTGGATCCC TGGTTCTGTT TGAAG 
(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19 



TTCTAGAATT CAGCGGCCGC TTTTTTTTTT TTTTTTTTTT XTTTTTTTTT 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
GGGGAAAGTT CACTGTCAGT CTCCAAG 
(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: 
GGGAATACAC ACAGACGGCT GAGTAG 
(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

AGCAAGTTCA GCCTGGTTAA GT 

(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs ' 



(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23 
TTATGAGTAT TTCTTCCAGG G 
(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24 
CCATTAATCC AGCCGAGCCA CACAAG 
(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25 
CATCTACAGC TCCGAACGGT CAGTG 
(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26 
CAGCGGAAGC CCCAACCGAG 
(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27 
GGGATGACGC CTCCTCCGCC CGG 
(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear ^ 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28 
AAGCTTCACG TGGACCAGCA AGCCAAGAGT G 
(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29 
AAGCTTTTTC CGTCCTTCCG TCCGG 



(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30 
ATGGTGAGCA AGGGCGAGGA GCTG 
(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31 
CTTGTACAGC TCGTCCATGC CGAG 
(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32 
GGGTGGTGAG AGTTCGTTGT CTGTC 
(2) INFORMATION FOR SEQ ID NO: 33: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 



(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



Si 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 
GAGCGATGAG GTACGGAAGA CTCTG 25 
(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5856 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

n 
00 

bj (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 

n AGCGCCCAAT ACGCAAACCG CCTCTCCCCG CGCGTTGGCC GATTCATTAA TGCAGCTGGC 6 0 

ACGACAGGTT TCCCGACTGG AAAGCGGGCA GTGAGCGCAA CGCAATTAAT GTGAGTTAGC 12 0 

!L TCACTCATTA GGCACCCCAG GCTTTACACT TTATGCTTCC GGCTCGTATG TTGTGTGGAA 18 0 

CP TTGTGAGCGG ATAACAATTT CACACAGGAA ACAGCTATGA C C ATGATT AC GCCAAGCTTC 24 0 

s 

Q1 ACGTGG AC C A GCAAGCCAAG AGTGAGTGTG GGCAGCACCC CCAGCCAGAG GGAGGCAGCC 3 00 

o 

AGGGCACAGG CATGACCCAG CAGGTGCTCG GCCATGATGC CGTCCTCGGG CAGCAGCTGC 360 

CAGTGCTGAG CCTCGCGGAA GTAGGAGCCC TCCTGGGCCT CGCAGAAGTA GTGGCCGTAC 420 

TGCTGCGCCG TGAGGTTCTC GATGAACAGG ATGCAGTTGG GGCTCTGGTG ACCAGGTTCG 4 80 

CAGCTCTGCT CCACGTTCTC CTTGTGGCGC CATGAGTAGG TGGCGTGGCG GGATTCCATG 540 

GGGCAGCTCA GGTAGTAGCG AGAGTTTGGG GCCAGGGAAA CCTTCTGCAG TGGGGCCTTG 6 00 

TCTGGTTTGG GGTTGGGACA CTCCTTGTGT GGCTCGGCTG GATTAATGGA TTGCAGCACT 660 

GACCGTTCGG AGCTGTAGAT GGAGATGCAG CGGCCCTGGT CCCAGCCGCA GTAGGGGTCT 72 0 

CGGGACATGA GGCAACCGTG GCAGCCCCCG CCATAGACCT CACACAGGTC CAGGGGCACC 780 

TGGCTCACCT CCCACTGGGA GCTCACATAC AGCTTCCTCC GCTCAGCATC CAGCGACATG 840 

GTCTGGATGG CAGCCGCGCG GCGGAAGGGC TGGATCTCCA TGATGTTGAA GGCGAAGCTG 900 



TCTAAATCGG GGGCTCCCTT TAGGGTTCCG ATTTAGAGCT TTACGGCACC TCGACCGCAA 2700 

AAAACTTGAT TTGGGTGATG GTTCACGTAG TGGGCCATCG CCCTGATAGA CGGTTTTTCG 2760 

CCCTTTGACG TTGGAGTCCA CGTTCTTTAA TAGTGGACTC TTGTTCCAAA CTGGAACAAC 282 0 

ACTCAACCCT ATCGCGGTCT ATTCTTTTGA TTTATAAGGG ATTTTGCCGA TTTCGGCCTA 28 80 

TTGGTTAAAA AATGAGCTGA TTTAACAAAT TCAGGGCGCA AGGGCTGCTA AAGGAAC CGG 2 940 

AACACGTAGA AAGCCAGTCC GCAGAAACGG TGCTGACCCC GGATGAATGT CAGCTACTGG 3 000 

GCTATCTGGA CAAGGGAAAA CGCAAGCGCA AAGAGAAAGC AGGTAGCTTG CAGTGGGCTT 3060 

ACATGGCGAT AGCTAGACTG GGCGGTTTTA TGGACAGCAA GCGAACCGGA ATTGCCAGCT 3120 

GGGGCGCCCT CTGGTAAGGT TGGGAAGCCC TGCAAAGTAA ACTGGATGGC TTTCTTGCCG 3180 

CCAAGGATCT GATGGCGCAG GGGATCAAGA TCTGATCAAG AGACAGGATG AGGATCGTTT 3240 

CGCATGATTG AACAAGATGG ATTGCACGCA GGTTCTCCGG CCGCTTGGGT GGAGAGGCTA 3300 

yj TTCGGCTATG ACTGGGCACA ACAGACAATC GGCTGCTCTG ATGCCGCCGT GTTCCGGCTG 336 0 

03 

yj TCAGCGCAGG GGCGCCCGGT TCTTTTTGTC AAGACCGACC TGTCCGGTGC CCTGAATGAA 342 0 

m 

Q CTGCAGGACG AGGCAGCGCG GCTATCGTGG CTGGCCACGA CGGGCGTTCC TTGCGCAGCT 3480 

.1 GTGCTCGACG TTGTCACTGA AGCGGGAAGG GACTGGCTGC TATTGGGCGA AGTGCCGGGG 354 0 

;L CAGGATCTCC TGTCATCTCG CCTTGCTCCT GCCGAGAAAG TATCCATCAT GGCTGATGCA 3600 

u 

EH ATGCGGCGGC TGCATACGCT TGATCCGGCT ACCTGCCCAT TCGACCACCA AGCGAAACAT 366 0 

u 

p CGCATCGAGC GAGCACGTAC TCGGATGGAA GCCGGTCTTG TCGATCAGGA TGATCTGGAC 372 0 

Q 

^ GAAGAGCATC AGGGGCTCGC GCCAGCCGAA CTGTTCGCCA GGCTCAAGGC GCGCATGCCC 378 0 

GACGGCGAGG ATCTCGTCGT GATCCATGGC GATGCCTGCT TGCCGAATAT CATGGTGGAA 384 0 

AATGGCCGCT TTTCTGGATT CAACGACTGT GGCCGGCTGG GTGTGGCGGA CCGCTATCAG 3900 

GACATAGCGT TGGATACCCG TGATATTGCT GAAGAGCTTG GCGGCGAATG GGCTGACCGC 3 96 0 

TTCCTCGTGC TTTACGGTAT CGCCGCTCCC GATTCGCAGC GCATCGCCTT CTATCGCCTT 4 02 0 

CTTGACGAGT TCTTCTGAAT TGAAAAAGGA AGAGTATGAG TATTCAACAT TTCCGTGTCG 4 080 

CCCTTATTCC CTTTTTTGCG GCATTTTGCC TTCCTGTTTT TGCTCACCCA GAAACGCTGG 4140 

TGAAAGTAAA AGATGCTGAA GATCAGTTGG GTGCACGAGT GGGTTACATC GAACTGGATC 42 00 

TCAACAGCGG TAAGATCCTT GAGAGTTTTC GCCCCGAAGA ACGTTTTCCA ATGATGAGCA 4260 

CTTTTAAAGT TCTGCTATGT CATACACTAT TATCCCGTAT TGACGCCGGG CAAGAGCAAC 4320 



TCGGTCGCCG GGCGCGGTAT TCTCAGAATG ACTTGGTTGA GTACTCACCA GTCACAGAAA 4380 

AGCATCTTAC GGATGGCATG ACAGTAAGAG AATTATGCAG TGCTGCCATA ACCATGAGTG 4440 

ATAACACTGC GGCCAACTTA CTTCTGACAA CGATCGGAGG ACCGAAGGAG CTAACCGCTT 4 500 

TTTTGCACAA CATGGGGGAT CATGTAACTC GCCTTGATCG TTGGGAACCG GAGCTGAATG 4 560 

AAGCCATACC AAACGACGAG AGTG AC AC C A CGATGCCTGT AGCAATGCCA ACAACGTTGC 462 0 

GCAAACTATT AACTGGCGAA CTACTTACTC TAGCTTCCCG GCAACAATTA ATAGACTGGA 4680 

TGGAGGCGGA TAAAGTTGCA GGACCACTTC TGCGCTCGGC CCTTCCGGCT GGCTGGTTTA 4 74 0 

TTGCTGATAA ATCTGGAGCC GGTGAGCGTG GGTCTCGCGG TATCATTGCA GCACTGGGGC 4800 

CAGATGGTAA GCCCTCCCGT ATCGTAGTTA TCTACACGAC GGGGAGTCAG GCAACTATGG 4860 

ATGAACGAAA TAGACAGATC GCTGAGATAG GTGCCTCACT GATTAAGCAT TGGTAACTGT 4920 

CAGACCAAGT TTACTCATAT ATACTTTAGA TTGATTTAAA ACTTCATTTT TAATTTAAAA 4980 

□ GGATCTAGGT GAAGATCCTT TTTGATAATC TCATGACCAA AATCCCTTAA CGTGAGTTTT 5040 
£0 CGTTCCACTG AGCGTCAGAC CCCGTAGAAA AGATCAAAGG ATCTTCTTGA GATCCTTTTT 5100 
m TTCTGCGCGT AATCTGCTGC TTGCAAACAA AAAAACCACC GCTACCAGCG GTGGTTTGTT 5160 

TGCCGGATCA AGAGCTACCA ACTCTTTTTC CGAAGGTAAC TGGCTTCAGC AGAGCGCAGA 522 0 

TACCAAATAC TGTCCTTCTA GTGTAGCCGT AGTTAGGCCA CCACTTCAAG AACTCTGTAG 528 0 

2! 

O CACCGCCTAC ATACCTCGCT CTGCTAATCC TGTTACCAGT GGCTGCTGCC AGTGGCGATA 534 0 

□ AGTCGTGTCT TACCGGGTTG GACTCAAGAC GATAGTTACC GGATAAGGCG CAGCGGTCGG 5400 
O GCTGAACGGG GGGTTCGTGC ACACAGCCCA GCTTGGAGCG AACGACCTAC ACCGAACTGA 546 0 

GATACCTACA GCGTGAGCAT TGAGAAAGCG CCACGCTTCC CGAAGGGAGA AAGGCGGACA 552 0 

GGTATCCGGT AAGCGGCAGG GTCGGAACAG GAGAGCGCAC GAGGGAGCTT CCAGGGGGAA 5580 

ACGCCTGGTA TCTTTATAGT CCTGTCGGGT TTCGCCACCT CTGACTTGAG CGTCGATTTT 5640 

TGTGATGCTC GTCAGGGGGG CGGAGCCTAT GGAAAAACGC CAGCAACGCG GCCTTTTTAC 5700 

GGTTCCTGGC CTTTTGCTGG CCTTTTGCTC ACATGTTCTT TCCTGCGTTA TCCCCTGATT 5760 

CTGTGGATAA CCGTATTACC GCCTTTGAGT GAGCTGATAC CGCTCGCCGC AGCCGAACGA 5820 

CCGAGCGCAG CGAGTCAGTG AGCGAGGAAG CGGAAG 5856 
(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7475 base pairs 

(B) TYPE: nucleic acid 



(C) STRANDEDNESS : single 

( D ) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

GACGGATCGG GAGATCTCCC GATCCCCTAT GGTCGACTCT CAGTACAATC TGCTCTGATG 60 

CCGCATAGTT AAGCCAGTAT CTGCTCCCTG CTTGTGTGTT GGAGGTCGCT GAGTAGTGCG 120 

CGAGCAAAAT TTAAGCTACA ACAAGGCAAG GCTTGACCGA CAATTGCATG AAGAATCTGC 180 

TTAGGGTTAG GCGTTTTGCG CTGCTTCGCG ATGTACGGGC CAGATATACG CGTTGACATT 240 

GATTATTGAC TAGTTATTAA TAGTAATCAA TTACGGGGTC ATTAGTT CAT AGCC CATATA 300 

TGGAGTTCCG CGTTACATAA CTTACGGTAA ATGGCCCGCC TGGCTGACCG CCCAACGACC 360 

Q CCCGCCCATT GACGTCAATA ATGACGTATG TTCCCATAGT AACGCCAATA GGGACTTTCC 420 

•sssr 

05 ATTGACGTCA ATGGGTGGAC TATTTACGGT AAACTGCCCA CTTGGCAGTA CATCAAGTGT 480 



3 x : 



ffi ATCATATGCC AAGTACGCCC CCTATTGACG TCAATGACGG TAAATGGCCC GCCTGGCATT 54 0 



SI 



CP 



us 



ATGCCCAGTA CATGACCTTA TGGGACTTTC CTACTTGGCA GTACATCTAC GTATTAGTCA 600 

TCGCTATTAC CATGGTGATG CGGTTTTGGC AGTACATCAA TGGGCGTGGA TAGCGGTTTG 660 

ACTCACGGGG ATTTCCAAGT CTCCACCCCA TTGACGTCAA TGGGAGTTTG TTTTGGCACC 72 0 

AAAATCAACG GGACTTTCCA AAATGTCGTA ACAACTCCGC CCCATTGACG CAAATGGGCG 780 

GTAGGCGTGT ACGGTGGGAG GTCTATATAA GCAGAGCTCT CTGGCTAACT AGAGAACCCA 840 

CTGCTTACTG GCTTATCGAA ATTAATACGA CTCAC TAT AG GGAGACC CAA GCTGGCTAGC 900 

GTTTAAACGG GCCCTCTAGA CTCGAGCGGC CGCCACTGTG CTGGATATCT GCAGAATTCG 96 0 

GCTTGGGATG ACGCCTCCTC CGCCCGGACG TGCCGCCCCC AGCGCACCGC GCGCCCGCGT 1020 

CCCTGGCCCG CCGGCTCGGT TGGGGCTTCC GCTGCGGCTG CGGCTGCTGC TGCTGCTCTG 1080 

GGCGGCCGCC GCCTCCGCCC AGGGCCACCT AAGGAGCGGA CCCCGCATCT TCGCCGTCTG 1140 

GAAAGGCCAT GTAGGGCAGG ACCGGGTGGA CTTTGGCCAG ACTGAGCCGC ACACGGTGCT 12 00 

TTTCCACGAG CCAGGCAGCT CCTCTGTGTG GGTGGGAGGA CGTGGCAAGG TCTACCTCTT 1260 

TGACTTCCCC GAGGGCAAGA ACGCATCTGT GCGCACGGTG AATATCGGCT CCACAAAGGG 132 0 

GTCCTGTCTG GATAAGCGGG ACTGCGAGAA CTACATCACT CTCCTGGAGA GGCGGAGTGA 1380 

GGGGCTGCTG GCCTGTGGCA CCAACGCCCG GCACCCCAGC TGCTGGAACC TGGTGAATGG 1440 



CACTGTGGTG CCACTTGGCG AGATGAGAGG CTACGCCCCC TTCAGCCCGG ACGAGAACTC 1500 

CCTGGTTCTG TTTGAAGGGG ACGAGGTGTA TTCCACCATC CGGAAGCAGG AATACAATGG 1560 

GAAGATCCCT CGGTTCCGCC GCATCCGGGG CGAGAGTGAG CTGTACACCA GTGATACTGT 162 0 

CATGCAGAAC CCACAGTTCA TCAAAGCCAC CATCGTGCAC CAAGACCAGG CTTACGATGA 1680 

CAAGATCTAC TACTTCTTCC GAGAGGACAA TCCTGACAAG AATCCTGAGG CTCCTCTCAA 1740 

TGTGTCCCGT GTGGCCCAGT TGTGCAGGGG GGACCAGGGT GGGGAAAGTT CACTGTCAGT 1800 

CTCCAAGTGG AACACTTTTC TGAAAGCCAT GCTGGTATGC AGTGATGCTG CCACCAACAA 1860 

GAACTTCAAC AGGCTGCAAG ACGTCTTCCT GCTCCCTGAC CCCAGCGGCC AGTGGAGGGA 192 0 

CACCAGGGTC TATGGTGTTT TCTCCAACCC CTGGAACTAC TCAGCCGTCT GTGTGTATTC 1980 

CCTCGGTGAC ATTGACAAGG TCTTCCGTAC CTCCTCACTC AAGGGCTACC ACTCAAGCCT 2 040 

TCCCAACCCG CGGCCTGGCA AGTGCCTCCC AGACCAGCAG CCGATACCCA CAGAGACC TT 2100 

tfl CCAGGTGGCT GACCGTCACC CAGAGGTGGC GCAGAGGGTG GAGCCCATGG GGCCTCTGAA 216 0 

hi GACGCCATTG TTCCACTCTA AATACCACTA C C AG AAAGTG GCCGTTCACC GCATGCAAGC 2220 

n CAGCCACGGG GAGACCTTTC ATGTGCTTTA CCTAACTACA GACAGGGGCA CTATCCACAA 228 0 

t l GGTGGTGGAA CCGGGGGAGC AGGAGCACAG CTTCGCCTTC AACATCATGG AGATCCAGCC 2340 

CTTCCGCCGC GCGGCTGCCA TCCAGACCAT GTCGCTGGAT GCTGAGCGGA GGAAGCTGTA 2400 

Uj TGTGAGCTCC CAGTGGGAGG TGAGCCAGGT GCCCCTGGAC CTGTGTGAGG TCTATGGCGG 246 0 

(P GGGCTGCCAC GGTTGCCTCA TGTCCCGAGA CCCCTACTGC GGCTGGGACC AGGGCCGCTG 252 0 

n 

lI CATCTCCATC TACAGCTCCG AACGGTCAGT GCTGCAATCC ATTAATCCAG CCGAGCCACA 2580 

CAAGGAGTGT CCCAACCCCA AACCAGACAA GGCCCCACTG CAGAAGGTTT CCCTGGCCCC 264 0 

AAACTCTCGC TACTACCTGA GCTGCCCCAT GGAATCCCGC CACGCCACCT ACTCATGGCG 2700 

CCACAAGGAG AACGTGGAGC AGAGCTGCGA ACCTGGTCAC CAGAGCCCCA ACTGCATCCT 2760 

GTTCATCGAG AACCTCACGG CGCAGCAGTA CGGCCACTAC TTCTGCGAGG CCCAGGAGGG 2820 

CTCCTACTTC CGCGAGGCTC AGCACTGGCA GCTGCTGCCC GAGGACGGCA TCATGGCCGA 2880 

GCACCTGCTG GGTCATGCCT GTGCCCTGGC TGCCTCCCTC TGGCTGGGGG TGCTGCCCAC 2 940 

ACTCACTCTT GGCTTGCTGG TCCACGTGAA GCTTGGGCCC GAACAAAAAC TCATCTCAGA 3000 

AGAGGATCTG AATAGCGCCG TCGACCATCA TCATCATCAT CATTGAGTTT AAACCGCTGA 3060 

TCAGCCTCGA CTGTGCCTTC TAGTTGCCAG CCATCTGTTG TTTGCCCCTC CCCCGTGCCT 3120 



TCCTTGACCC TGGAAGGTGC 
TCGCATTGTC TGAGTAGGTG 
GGGGAGGATT GGGAAGACAA 
GAGGCGGAAA GAACCAGCTG 
TTAAGCGCGG CGGGTGTGGT 
GCGCCCGCTC CTTTCGCTTT 
CAAGCTCTAA ATCGGGGCAT 
CCCAAAAAAC TTGATTAGGG 
TTTCGCCCTT TGACGTTGGA 
ACAACACTCA ACCCTATCTC 
GCCTATTGGT TAAAAAATGA 
ATGTGTGTCA GTTAGGGTGT 
AGCATG CATC TCAATTAGTC 
AGAAGTATGC AAAGCATGCA 
CCCATCCCGC CCCTAACTCC 
TTTTTTATTT ATGCAGAGGC 
GGAGGCTTTT TTGGAGGCCT 
TTCGGATCTG ATCAAGAGAC 
CACGCAGGTT CTCCGGCCGC 
ACAATCGGCT GCTCTGATGC 
TTTGTCAAGA CCGACCTGTC 
TCGTGGCTGG CCACGACGGG 
GGAAGGGACT GGCTGCTATT 
GCTCCTGCCG AGAAAGTATC 
CCGGCTACCT GCCCATTCGA 
ATGGAAGCCG GTCTTGTCGA 
GCCGAACTGT TCGCCAGGCT 
CATGGCGATG CCTGCTTGCC 
GACTGTGGCC GGCTGGGTGT 



CACTCCCACT GTCCTTTCCT 
TCATTCTATT CTGGGGGGTG 
TAGCAGGCAT GCTGGGGATG 
GGGCTCTAGG GGGTATCCCC 
GGTTACGCGC AGCGTGACCG 
CTTCCCTTCC TTTCTCGCCA 
CCCTTTAGGG TTCCGATTTA 
TGATGGTTCA CGTAGTGGGC 
GTCCACGTTC TTTAATAGTG 
GGTCTATTCT TTTGATTTAT 
GCTGATTTAA CAAAAATTTA 
GGAAAGTCCC CAGGCTCCCC 
AGCAACCAGG TGTGGAAAGT 
TCTCAATTAG TCAGCAACCA 
GCCCAGTTCC GCCCATTCTC 
CGAGGCCGCC TCTGCCTCTG 
AGGCTTTTGC AAAAAGCTCC 
AGGATGAGGA TCGTTTCGCA 
TTGGGTGGAG AGGCTATTCG 
CGCCGTGTTC CGGCTGTCAG 
CGGTGCCCTG AATGAACTGC 
CGTTCCTTGC GCAGCTGTGC 
GGGCGAAGTG CCGGGGCAGG 
CATCATGGCT GATGCAATGC 
CCACCAAGCG AAACATCGCA 
TCAGGATGAT CTGGACGAAG 
CAAGGCGCGC ATGCCCGACG 
GAATATCATG GTGGAAAATG 
GGCGGACCGC TATCAGGACA 



AATAAAATGA GGAAATTGCA 
GGGTGGGGCA GGACAGCAAG 
CGGTGGGCTC TATGGCTTCT 
ACGCGCCCTG TAGCGGCGCA 
CTACACTTGC CAGCGCCCTA 
CGTTCGCCGG CTTTCCCCGT 
GTGCTTTACG GCACCTCGAC 
CATCGCCCTG ATAGACGGTT 
GACTCTTGTT CCAAACTGGA 
AAGGGATTTT GGGGATTTCG 
ACGCGAATTA ATTCTGTGGA 
AGGCAGGCAG AAGTATGCAA 
CCCCAGGCTC CCCAGCAGGC 
TAGTCCCGCC CCTAACTCCG 
CGCCCCATGG CTGACTAATT 
AGCTATTCCA GAAGTAGTGA 
CGGGAGCTTG TATATCCATT 
TGATTGAACA AGATGGATTG 
GCTATGACTG GGCACAACAG 
CGCAGGGGCG CCCGGTTCTT 
AGGACGAGGC AGCGCGGCTA 
TCGACGTTGT CACTGAAGCG 
ATCTCCTGTC ATCTCACCTT 
GGCGGCTGCA TACGCTTGAT 
TCGAGCGAGC ACGTACTCGG 
AGCATCAGGG GCTCGCGCCA 
GCGAGGATCT CGTCGTGACC 
GCCGCTTTTC TGGATTCATC 
TAGCGTTGGC TACCCGTGAT 



ATTGCTGAAG AGCTTGGCGG CGAATGGGCT GACCGCTTCC TCGTGCTTTA CGGTATCGCC 4920 

GCTCCCGATT CGCAGCGCAT CGCCTTCTAT CGCCTTCTTG ACGAGTTCTT CTGAGCGGGA 4980 

CTCTGGGGTT CGAAATGACC GACCAAGCGA CGCCCAACCT GCCATCACGA GATTTCGATT 504 0 

CCACCGCCGC CTTCTATGAA AGGTTGGGCT TCGGAATCGT TTTCCGGGAC GCCGGCTGGA 5100 

TGATCCTCCA GCGCGGGGAT CTCATGCTGG AGTTCTTCGC CCACCCCAAC TTGTTTATTG 516 0 

CAGCTTATAA TGGTTACAAA TAAAGCAATA GCATCACAAA TTTCACAAAT AAAGCATTTT 5220 

TTTCACTGCA TTCTAGTTGT GGTTTGTCCA AACTCATCAA TGTATCTTAT CATGTCTGTA 5280 

TACCGTCGAC CTCTAGCTAG AGCTTGGCGT AATCATGGTC ATAGCTGTTT CCTGTGTGAA 5340 

ATTGTTATCC GCTCACAATT CCACACAACA TACGAGCCGG AAGCATAAAG TGTAAAGCCT 5400 

GGGGTGCCTA ATGAGTGAGC TAACTCACAT TAATTGCGTT GCGCTCACTG CCCGCTTTCC 5460 

^ AGTCGGGAAA CCTGTCGTGC CAGCTGCATT AATGAATCGG CCAACGCGCG GGGAGAGGCG 5520 

GTTTGCGTAT TGGGCGCTCT TCCGCTTCCT CGCTCACTGA CTCGCTGCGC TCGGTCGTTC 5580 

00 

(xi GGCTGCGGCG AGCGGTATCA GCTCACTCAA AGGCGGTAAT ACGGTTATCC ACAGAATCAG 5640 

m 

?*t GGGATAACGC AGGAAAGAAC ATGTGAGCAA AAGGCCAGCA AAAGGCCAGG AACCGTAAAA 5700 

t 1 AGGCCGCGTT GCTGGCGTTT TTCCATAGGC TCCGCCCCCC TGACGAGCAT CACAAAAATC 576 0 

!L GACGCTCAAG TCAGAGGTGG CGAAACCCGA CAGGACTATA AAG AT AC C AG GCGTTTCCCC 582 0 

CP CTGGAAGCTC CCTCGTGCGC TCTCCTGTTC CGACCCTGCC GCTTACCGGA TACCTGTCCG 5880 

Q 

EJ] CCTTTCTCCC TTCGGGAAGC GTGGCGCTTT CTCAATGCTC ACGCTGTAGG TATCTCAGTT 5940 

m 

ri CGGTGTAGGT CGTTCGCTCC AAGCTGGGCT GTGTGCACGA ACCCCCCGTT CAGCCCGACC 6 000 

GCTGCGCCTT ATCCGGTAAC TATCGTCTTG AGTCCAACCC GGTAAGACAC GACTTATCGC 6 060 

CACTGGCAGC AGCCACTGGT AACAGGATTA GCAGAGCGAG GTATGTAGGC GGTGCTACAG 612 0 

AGTTCTTGAA GTGGTGGCCT AACTACGGCT ACACTAGAAG GACAGTATTT GGTATCTGCG 6180 

CTCTGCTGAA GCCAGTTACC TTCGGAAAAA GAGTTGGTAG CTCTTGATCC GGCAAACAAA 624 0 

CCACCGCTGG TAGCGGTGGT TTTTTTGTTT GCAAGCAGCA GATTACGCGC AGAAAAAAAG 63 00 

GATCTCAAGA AGATCCTTTG ATCTTTTCTA CGGGGTCTGA CGCTCAGTGG AACGAAAACT 63 60 

CACGTTAAGG GATTTTGGTC ATGAGATTAT CAAAAAGGAT CTTCACCTAG ATCCTTTTAA 6420 

ATTAAAAATG AAGTTTTAAA TCAATCTAAA GTATATATGA GTAAACTTGG TCTGACAGTT 6480 

ACCAATGCTT AATCAGTGAG GCACCTATCT CAGCGATCTG TCTATTTCGT TCATCCATAG 6540 



TTGCCTGACT CCCCGTCGTG TAGATAACTA CGATACGGGA GGGCTTACCA TCTGGCCCCA 66 00 

GTGCTGCAAT GATACCGCGA GACCCACGCT CACCGGCTCC AGATTTATCA GCAATAAACC 6660 

AGCCAGCCGG AAGGGCCGAG CGCAGAAGTG GTCCTGCAAC TTTATCCGCC TCCATCCAGT 6720 

CTATTAATTG TTGCCGGGAA GCTAGAGTAA GTAGTTCGCC AGTTAATAGT TTGCGCAACG 6780 

TTGTTGCCAT TGCTACAGGC ATCGTGGTGT CACGCTCGTC GTTTGGTATG GCTTCATTCA 6840 

GCTCCGGTTC CCAACGATCA AGGCGAGTTA CATGATCCCC CATGTTGTGC AAAAAAGCGG 6 900 

TTAGCTCCTT CGGTCCTCCG ATCGTTGTCA GAAGTAAGTT GGCCGCAGTG TTATCACTCA 6 960 

TGGTTATGGC AGCACTGCAT AATTCTCTTA CTGTCATGCC ATCCGTAAGA TGCTTTTCTG 702 0 

TGACTGGTGA GTACTCAACC AAGTCATTCT GAGAATAGTG TATGCGGCGA CCGAGTTGCT 7080 

CTTGCCCGGC GTCAATACGG GATAATACCG CGCCACATAG CAGAACTTTA AAAGTGCTCA 7140 

TCATTGGAAA ACGTTCTTCG GGGCGAAAAC TCTCAAGGAT CTTACCGCTG TTGAGATCCA 7200 

GTTCGATGTA ACCCACTCGT GCACCCAACT GATCTTCAGC ATCTTTTACT TTCACCAGCG 726 0 

TTTCTGGGTG AGCAAAAACA GGAAGGCAAA ATGCCGCAAA AAAGGGAATA AGGGCGACAC 732 0 



jjl GGAAATGTTG AATACTCATA CTCTTCCTTT TTCAATATTA TTGAAGCATT TATCAGGGTT 7380 

n 

G ATTGTCTCAT GAG CGG AT AC ATATTTGAAT GTATTTAGAA AAATAAACAA ATAGGGGTTC 744 0 

^ CGCGCACATT TCCCCGAAAA GTGCCACCTG ACGTC 7475 

O (2) INFORMATION FOR SEQ ID NO: 36: 

01 

p (i) SEQUENCE CHARACTERISTICS: 

Q1 (A) LENGTH: 8192 base pairs 

g (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:36: 

GACGGATCGG GAGATCTCCC GATCCCCTAT GGTCGACTCT CAGTACAATC TGCTCTGATG 6 0 

CCGCATAGTT AAGCCAGTAT CTGCTCCCTG CTTGTGTGTT GGAGGTCGCT GAGTAGTGCG 12 0 

CGAGCAAAAT TTAAGCTACA ACAAGGCAAG GCTTGACCGA CAATTGCATG AAGAATCTGC 180 

TTAGGGTTAG GCGTTTTGCG CTGCTTCGCG ATGTACGGGC CAGATATACG CGTTGACATT 240 

GATTATTGAC TAGTTATTAA TAGTAATCAA TTACGGGGTC ATTAGTTCAT AGCCCATATA 3 00 



TGGAGTTCCG CGTTACATAA CTTACGGTAA ATGGCCCGCC TGGCTGACCG CCCAACGACC 360 

CCCGCCCATT GACGTCAATA ATGACGTATG TTCCCATAGT AACGCCAATA GGGACTTTCC 420 

ATTGACGTCA ATGGGTGGAC TATTTACGGT AAACTGCCCA CTTGGCAGTA CATCAAGTGT 480 

ATCATATGCC AAGTACGCCC CCTATTGACG TCAATGACGG TAAATGGCCC GCCTGGCATT 540 

ATGCCCAGTA CATGACCTTA TGGGACTTTC CTACTTGGCA GTACATCTAC GTATTAGTCA 600 

TCGCTATTAC CATGGTGATG CGGTTTTGGC AGTACATCAA TGGGCGTGGA TAGCGGTTTG 660 

ACTCACGGGG ATTTCCAAGT CTCCACCCCA TTGACGTCAA TGGGAGTTTG TTTTGGCACC 72 0 

AAAATCAACG GGACTTTCCA AAATGTCGTA ACAACTCCGC CCCATTGACG CAAATGGGCG 780 

GTAGGCGTGT ACGGTGGGAG GTC TATATAA GCAGAGCTCT CTGGCTAACT AGAGAACCCA 840 

CTGCTTACTG GCTTATCGAA ATTAATACGA CTCACTATAG GGAGACCCAA GCTGGCTAGC 900 

GTT TAAACGG GCCCTCTAGA CTCGAGCGGC CGCCACTGTG CTGGATATCT GCAGAATTCG 960 

O GCTTGGGATG ACGCCTCCTC CGCCCGGACG TGCCGCCCCC AGCGCACCGC GCGCCCGCGT 1020 

f¥j CCCTGGCCCG CCGGCTCGGT TGGGGCTTCC GCTGCGGCTG CGGCTGCTGC TGCTGCTCTG 1080 

GGCGGCCGCC GCCTCCGCCC AGGGCCACCT AAGGAGCGGA CCCCGCATCT TCGCCGTCTG 1140 

GAAAGGC CAT GTAGGGCAGG ACCGGGTGGA CTTTGGCCAG ACTGAGCCGC ACACGGTGCT 1200 

TTTCCACGAG CCAGGCAGCT CCTCTGTGTG GGTGGGAGGA CGTGGCAAGG TCTACCTCTT 1260 

Q TGACTTCCCC GAGGGCAAGA ACGCATCTGT GCGCACGGTG AATATCGGCT CCACAAAGGG 1320 

on 

Q GTCCTGTCTG GATAAGCGGG ACTGCGAGAA CTACATCACT CTCCTGGAGA GGCGGAGTGA 1380 

f»! GGGGCTGCTG GCCTGTGGCA CCAACGCCCG GCACCCCAGC TGCTGGAACC TGGTGAATGG 1440 

CACTGTGGTG CCACTTGGCG AGATGAGAGG CTACGCCCCC TTCAGCCCGG ACGAGAACTC 1500 

CCTGGTTCTG TTTGAAGGGG ACGAGGTGTA TTCCACCATC CGGAAGCAGG AATACAATGG 1560 

GAAGATCCCT CGGTTCCGCC GCATCCGGGG CGAGAGTGAG CTGTACACCA GTGATACTGT 162 0 

CATGCAGAAC CCACAGTTCA TCAAAGCCAC CATCGTGCAC CAAGACCAGG CTTACGATGA 1680 

CAAGATCTAC TACTTCTTCC GAGAGGACAA TCCTGACAAG AATCCTGAGG CTCCTCTCAA 1740 

TGTGTCCCGT GTGGCCCAGT TGTGCAGGGG GGACCAGGGT GGGGAAAGTT CACTGTCAGT 1800 

CTCCAAGTGG AACACTTTTC TGAAAGCCAT GCTGGTATGC AGTGATGCTG CCACCAACAA 1860 

GAACTTCAAC AGGCTGCAAG ACGTCTTCCT GCTCCCTGAC CCCAGCGGCC AGTGGAGGGA 192 0 

CACCAGGGTC TATGGTGTTT TCTCCAACCC CTGGAACTAC TCAGCCGTCT GTGTGTATTC 198 0 

CCTCGGTGAC ATTGACAAGG TCTTCCGTAC CTCCTCACTC AAGGGCTACC ACTCAAGCCT 2 04 0 



u 

01 



GGATCTGAAT AGCGCCGTCG ACCATCATCA 
GCCTCGACTG TGCCTTCTAG TTGCCAGCCA 
TTGACCCTGG AAGGTGCCAC TCCCACTGTC 
CATTGTCTGA GTAGGTGTCA TTCTATTCTG 
GAGGATTGGG AAGACAATAG CAGGCATGCT 
GCGGAAAGAA CCAGCTGGGG CTCTAGGGGG 
AGCGCGGCGG GTGTGGTGGT TACGCGCAGC 
CCCGCTCCTT TCGCTTTCTT CCCTTCCTTT 
GCTCTAAATC GGGGCATCCC TTTAGGGTTC 
AAAAAACTTG ATTAGGGTGA TGGTTCACGT 
CGCCCTTTGA CGTTGGAGTC CACGTTCTTT 
ACACTCAACC CTATCTCGGT CTATTCTTTT 
TATTGGTTAA AAAATGAGCT GATTTAACAA 
TGTGTCAGTT AGGGTGTGGA AAGTCCCCAG 
ATGCATCTCA ATTAGTCAGC AACCAGGTGT 
AGTATGCAAA GCATGCATCT CAATTAGTCA 
ATCCCGCCCC TAACTCCGCC CAGTTCCGCC 
TTTATTTATG CAGAGGCCGA GGCCGCCTCT 
GGCTTTTTTG GAGGCCTAGG CTTTTGCAAA 
GGATCTGATC AAGAGACAGG ATGAGGATCG 
GCAGGTTCTC CGGCCGCTTG GGTGGAGAGG 
ATCGGCTGCT CTGATGCCGC CGTGTTCCGG 
GTCAAGACCG ACCTGTCCGG TGCCCTGAAT 
TGGCTGGCCA CGACGGGCGT TCCTTGCGCA 
AGGGACTGGC TGCTATTGGG CGAAGTGCCG 
CCTGCCGAGA AAGTATCCAT CATGGCTGAT 
GCTACCTGCC CATTCGACCA CCAAGCGAAA 
GAAGCCGGTC TTGTCGATCA GGATGATCTG 
GAACTGTTCG CCAGGCTCAA GGCGCGCATG 



TCATCATCAT TGAGTTTAAA CCGCTGATCA 
TCTGTTGTTT GCCCCTCCCC CGTGCCTTCC 
CTTTCCTAAT AAAATGAGGA AATTGCATCG 
GGGGGTGGGG TGGGGCAGGA CAGCAAGGGG 
GGGGATGCGG TGGGCTCTAT GGCTTCTGAG 
TATCCCCACG CGCCCTGTAG CGGCGCATTA 
GTGACCGCTA CACTTGCCAG CGCCCTAGCG 
CTCGCCACGT TCGCCGGCTT TCCCCGTCAA 
CGATTTAGTG CTTTACGGCA CCTCGACCCC 
AGTGGGCCAT CGCCCTGATA GACGGTTTTT 
AATAGTGGAC TCTTGTTCCA AACTGGAACA 
GATTTATAAG GGATTTTGGG GATTTCGGCC 
AAATTTAACG CGAATTAATT CTGTGGAATG 
GCTCCCCAGG CAGGCAGAAG TATGCAAAGC 
GGAAAGTCCC CAGGCTCCCC AGCAGGCAGA 
GCAAC CATAG TCCCGCCCCT AACTCCGCCC 
CATTCTCCGC CCCATGGCTG ACTAATTTTT 
GCCTCTGAGC TATTC CAGAA GTAGTGAGGA 
AAGCTCCCGG GAGC TTGTAT ATCCATTTTC 
TTTCGCATGA TTGAACAAGA TGGATTGCAC 
CTATTCGGCT ATGACTGGGC ACAACAGACA 
CTGTCAGCGC AGGGGCGCCC GGTTCTTTTT 
GAACTGCAGG ACGAGGCAGC GCGGCTATCG 
GCTGTGCTCG ACGTTGTCAC TGAAGCGGGA 
GGGCAGGATC TCCTGTCATC TCACCTTGCT 
GCAATGCGGC GGCTGCATAC GCTTGATCCG 
CATCGCATCG AGCGAGCACG TACTCGGATG 
GACGAAGAGC ATCAGGGGCT CGCGCCAGCC 
CCCGACGGCG AGGATCTCGT CGTGACCCAT 



GGCGATGCCT GCTTGCCGAA TATCATGGTG 
TGTGGCCGGC TGGGTGTGGC GGACCGCTAT 
GCTGAAGAGC TTGGCGGCGA ATGGGCTGAC 
CCCGATTCGC AGCGCATCGC CTTCTATCGC 
TGGGGTTCGA AATGACCGAC CAAGCGACGC 
CCGCCGCCTT CTATGAAAGG TTGGGCTTCG 
TCCTCCAGCG CGGGGATCTC ATGCTGGAGT 
CTTATAATGG TTACAAATAA AGCAATAGCA 
CACTGCATTC TAGTTGTGGT TTGTCCAAAC 
CGTCGACCTC TAGCTAGAGC TTGGCGTAAT 
GTTATCCGCT CACAATTCCA CACAACATAC 
GTGCCTAATG AGTGAGC TAA CTCACATTAA 
CGGGAAACCT GTCGTGCCAG CTGCATTAAT 
TGCGTATTGG GCGCTCTTCC GCTTCCTCGC 
TGCGGCGAGC GGTATCAGCT CACTCAAAGG 
ATAACGCAGG AAAGAACATG TGAGCAAAAG 
CCGCGTTGCT GGCGTTTTTC CATAGGCTCC 
GCTCAAGTCA GAGGTGGCGA AACCCGACAG 
GAAGCTCCCT CGTGCGCTCT CCTGTTCCGA 
TTCTCCCTTC GGGAAGCGTG GCGCTTTCTC 
TGTAGGTCGT TCGCTCCAAG CTGGGCTGTG 
GCGCCTTATC CGGTAACTAT CGTCTTGAGT 
TGGCAGCAGC CACTGGTAAC AGGATTAGCA 
TCTTGAAGTG GTGGCCTAAC TACGGCTACA 
TGCTGAAGCC AGTTACCTTC GGAAAAAGAG 
CCGCTGGTAG CGGTGGTTTT TTTGTTTGCA 
CTCAAGAAGA TCCTTTGATC TTTTCTACGG 
GTTAAGGGAT TTTGGTCATG AGATTATCAA 



GAAAATGGCC GCTTTTCTGG ATTCATCGAC 
CAGGACATAG CGTTGGCTAC CCGTGATATT 
CGCTTCCTCG TGCTTTACGG TATCGCCGCT 
CTTCTTGACG AGTTCTTCTG AGCGGGACTC 
CCAACCTGCC ATCACGAGAT TTCGATTCCA 
GAATCGTTTT . CCGGGACGCC GGCTGGATGA 
TCTTCGCCCA CCCCAACTTG TTTATTGCAG 
TCACAAATTT CACAAATAAA GCATTTTTTT 
TCATCAATGT ATCTTATCAT GTCTGTATAC 
CATGGTCATA GCTGTTTCCT GTGTGAAATT 
GAGC CGGAAG CATAAAGTGT AAAGCCTGGG 
TTGCGTTGCG CTCACTGCCC GCTTTCCAGT 
GAATCGGCCA ACGCGCGGGG AGAGGCGGTT 
TCACTGACTC GCTGCGCTCG GTCGTTCGGC 
CGGTAATACG GTTATCCACA GAATCAGGGG 
GCCAGCAAAA GGCCAGGAAC CGTAAAAAGG 
GCCCCCCTGA CGAGCATCAC AAAAATCGAC 
GACTATAAAG ATACCAGGCG TTTCCCCCTG 
CCCTGCCGCT TACCGGATAC CTGTCCGCCT 
AATGCTCACG CTGTAGGTAT CTCAGTTCGG 
TGCACGAACC CCCCGTTCAG CCCGACCGCT 
CCAACCCGGT AAGACACGAC TTATCGCCAC 
GAGCGAGGTA TGTAGGCGGT GCTACAGAGT 
CTAGAAGGAC AGTATTTGGT ATCTGCGCTC 
TTGGTAGCTC TTGATCCGGC AAACAAACCA 
AGCAGCAGAT TACGCGCAGA AAAAAAGGAT 
GGTCTGACGC TCAGTGGAAC GAAAACTCAC 
AAAGGATCTT CACCTAGATC CTTTTAAATT 



AAAAATGAAG TTTTAAATCA ATC TAAAGTA TATATGAGTA AACTTGGTCT GACAGTTACC 



7200 



AATGCTTAAT CAGTGAGGCA CCTATCTCAG CGATCTGTCT ATTTCGTTCA TCCATAGTTG 726 0 

CCTGACTCCC CGTCGTGTAG ATAACTACGA TACGGGAGGG CTTACCATCT GGCCCCAGTG 732 0 

CTGCAATGAT ACCGCGAGAC CCACGCTCAC CGGCTCCAGA TTTATCAGCA ATAAAC CAGC 7380 

CAGCCGGAAG GGCCGAGCGC AGAAGTGGTC CTGCAACTTT ATCCGCCTCC ATCCAGTCTA 7440 

TTAATTGTTG CCGGGAAGCT AGAGTAAGTA GTTCGCCAGT TAATAGTTTG CGCAACGTTG 7500 

TTGCCATTGC TACAGGCATC GTGGTGTCAC GCTCGTCGTT TGGTATGGCT TCATTCAGCT 7560 

CCGGTTCCCA ACGATCAAGG CGAGTTACAT GATCCCCCAT GTTGTGCAAA AAAGCGGTTA 7620 

GCTCCTTCGG TCCTCCGATC GTTGTCAGAA GTAAGTTGGC CGCAGTGTTA TCACTCATGG 7680 

TTATGGCAGC ACTGCATAAT TCTCTTACTG TCATGCCATC CGTAAGATGC TTTTCTGTGA 7740 

CTGGTGAGTA CTCAACCAAG TCATTCTGAG AATAGTGTAT GCGGCGACCG AGTTGCTCTT 78 00 

Q GCCCGGCGTC AATACGGGAT AATACCGCGC CACATAGCAG AACTTTAAAA GTGCTCATCA 7860 

gj TTGGAAAACG TTCTTCGGGG CGAAAACTCT CAAGGATCTT ACCGCTGTTG AGATCCAGTT 7920 

y 

fa CGATGTAACC CACTCGTGCA CCCAACTGAT CTTCAGCATC TTTTACTTTC ACCAGCGTTT 7 980 

ft CTGGGTGAGC AAAAACAGGA AGGCAAAATG CCGCAAAAAA GGGAATAAGG GCGACACGGA 8040 

N AATGTTGAAT ACTCATACTC TTCCTTTTTC AATATTATTG AAGCATTTAT CAGGGTTATT 8100 

3 

Q GTCTCATGAG CGGATACATA TTTGAATGTA TTTAGAAAAA TAAACAAATA GGGGTTCCGC 8160 
n GCACATTTCC CCGAAAAGTG CCACCTGACG TC 8192 

on 

js* (2) INFORMATION FOR SEQ ID NO: 37: 

^ (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7000 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) f 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 

AGATCTCGGC CGCATATTAA GTGCATTGTT CTCGATACCG CTAAGTGCAT TGTTCTCGTT 60 

AGCTCGATGG ACAAGTGCAT TGTTCTCTTG CTGAAAGCTC GATGGACAAG TGCATTGTTC 120 

TCTTGCTGAA AGCTCGATGG ACAAGTGCAT TGTTCTCTTG CTGAAAGCTC AGTACCCGGG 180 



AGTACCCTCG ACCGCCGGAG TATAAATAGA GGCGCTTCGT CTACGGAGCG ACAATTCAAT 240 

TCAAACAAGC AAAGTGAACA CGTCGCTAAG CGAAAGCTAA GCAAATAAAC AAGCGCAGCT 300 

GAACAAGCTA AACAATCTGC AGTAAAGTGC AAGTTAAAGT GAATCAATTA AAAGTAAC C A 360 

GCAACCAAGT AAATCAACTG CAACTACTGA AATCTGCCAA GAAGTAATTA TTGAATACAA 420 

GAAGAGAACT CTGAATACTT TCAACAAGTT AC CG AGAAAG AAGAACTCAC ACACAGCTAG 48 0 

CGTTTAAACT TAAGCTTGGT ACCGAGCTCG GATCCACTAG TCCAGTGTGG TGGAATTCGG 54 0 

CTTGGGATGA CGCCTCCTCC GCCCGGACGT GCCGCCCCCA GCGCACCGCG CGCCCGCGTC 600 

CCTGGCCCGC CGGCTCGGTT GGGGCTTCCG CTGCGGCTGC GGCTGCTGCT GCTGCTCTGG 660 

GCGGCCGCCG CCTCCGCCCA GGGCCACCTA AGGAGCGGAC CCCGCATCTT CGCCGTCTGG 72 0 

AAAGGCCATG TAGGGCAGGA CCGGGTGGAC TTTGGCCAGA CTGAGCCGCA CACGGTGCTT 780 

TTCCACGAGC CAGGCAGCTC CTCTGTGTGG GTGGGAGGAC GTGGCAAGGT CTACCTCTTT 84 0 

Q GACTTCCCCG AGGGCAAGAA CGCATCTGTG CGCACGGTGA ATATCGGCTC CACAAAGGGG 900 

Qj TCCTGTCTGG ATAAGCGGGA CTGCGAGAAC TACATCACTC TCCTGGAGAG GCGGAGTGAG 96 0 

GGGCTGCTGG CCTGTGGCAC CAACGCCCGG CACCCCAGCT GCTGGAACCT GGTGAATGGC 102 0 

ACTGTGGTGC CACTTGGCGA GATGAGAGGC TACGCCCCCT TCAGCCCGGA CGAGAACTCC 1080 

CTGGTTCTGT TTGAAGGGGA CGAGGTGTAT TCCACCATCC GGAAGCAGGA ATACAATGGG 1140 

AAGATCCCTC GGTTCCGCCG CATCCGGGGC GAGAGTGAGC TGTACACCAG TGATACTGTC 12 00 

ATGCAGAACC CACAGTTCAT CAAAGCCACC ATCGTGCACC AAGAC CAGGC TTACGATGAC 1260 

AAGATCTACT ACTTCTTCCG AGAGGACAAT CCTGACAAGA ATCCTGAGGC TCCTCTCAAT 132 0 

GTGTCCCGTG TGGCCCAGTT GTGCAGGGGG GAC C AGGGTG GGGAAAGTTC ACTGTCAGTC 13 80 

TCCAAGTGGA ACACTTTTCT GAAAGCCATG CTGGTATGCA GTGATGCTGC CACCAACAAG 1440 

AACTTCAACA GGCTGCAAGA CGTCTTCCTG CTCCCTGACC CCAGCGGCCA GTGGAGGGAC 1500 

ACCAGGGTCT ATGGTGTTTT CTCCAACCCC TGGAACTACT CAGCCGTCTG TGTGTATTCC 1560 

CTCGGTGACA TTGACAAGGT CTTCCGTACC TCCTCACTCA AGGGCTACCA CTCAAGCCTT 1620 

CCCAACCCGC GGCCTGGCAA GTGCCTCCCA GACCAGCAGC CGATACCCAC AGAGACCTTC 1680 

CAGGTGGCTG ACCGTCACCC AGAGGTGGCG CAGAGGGTGG AGCCCATGGG GCCTCTGAAG 1740 

ACGCCATTGT TCCACTCTAA ATACCACTAC CAGAAAGTGG CCGTTCACCG CATGCAAGCC 1800 

AGCCACGGGG AGACCTTTCA TGTGCTTTAC CTAACTACAG ACAGGGGCAC TATCCACAAG 186 0 

GTGGTGGAAC CGGGGGAGCA GGAGCACAGC TTCGCCTTCA ACATCATGGA GATCCAGCCC 192 0 
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TTCCGCCGCG CGGCTGCCAT CCAGACCATG TCGCTGGATG CTGAGCGGAG GAAGCTGTAT 198 0 

GTGAGCTCCC AGTGGGAGGT GAGCCAGGTG CCCCTGGACC TGTGTGAGGT CTATGGCGGG 2 040 

GGCTGCCACG GTTGCCTCAT GTCCCGAGAC CCCTACTGCG GCTGGGACCA GGGCCGCTGC 2100 

ATCTCCATCT ACAGCTCCGA ACGGTCAGTG CTGCAATCCA TTAATCCAGC CGAGCCACAC 216 0 

AAGGAGTGTC CCAACCCCAA ACCAGACAAG GCCCCACTGC AGAAGGTTTC CCTGGCCCCA 222 0 

AACTCTCGCT ACTACCTGAG CTGCCCCATG GAATCCCGCC ACGCCACCTA CTCATGGCGC 228 0 

CACAAGGAGA ACGTGGAGCA GAGCTGCGAA CCTGGTCACC AGAGCCCCAA CTGCATCCTG 2340 

TTCATCGAGA ACCTCACGGC GCAGCAGTAC GGCCACTACT TCTGCGAGGC CCAGGAGGGC 24 00 

TCCTACTTCC GCGAGGCTCA GCACTGGCAG CTGCTGCCCG AGGACGGCAT CATGGCCGAG 2460 

CACCTGCTGG GTCATGCCTG TGCCCTGGCT GCCTCCCTCT GGCTGGGGGT GCTGCCCACA 252 0 

_ CTCACTCTTG GCTTGCTGGT CCACGTGAAG CTTGGGCCCG TTTAAACCCG CTGATCAGCC 2 580 

TCGACTGTGC CTTCTAGTTG CCAGCCATCT GTTGTTTGCC CCTCCCCCGT GCCTTCCTTG 2640 

jVj ACCCTGGAAG GTGCCACTCC CACTGTCCTT TCCTAATAAA ATGAGGAAAT TGCATCGCAT 2700 
01 

q TGTCTGAGTA GGTGTCATTC TATTCTGGGG GGTGGGGTGG GGCAGGACAG CAAGGGGGAG 2760 

,"1 GATTGGGAAG ACAATAGCAG GCATGCTGGG GATGCGGTGG GCTCTATGGC TTCTGAGGCG 282 0 

s GAAAGAACCA GCTGGGGCTC TAGGGGGTAT CCCCACGCGC CCTGTAGCGG CGCATTAAGC 2880 

o 

01 GCGGCGGGTG TGGTGGTTAC GCGCAGCGTG ACCGCTACAC TTGCCAGCGC CCTAGCGCCC 2 940 

ffi GCTCCTTTCG CTTTCTTCCC TTCCTTTCTC GCCACGTTCG CCGGCTTTCC CCGTCAAGCT 3 000 

l7 CTAAATCGGG GCATCCCTTT AGGGTTCCGA TTTAGTGCTT TACGGCACCT CGACCCCAAA 3 060 

AAACTTGATT AGGGTGATGG TTCACGTAGT GGGCCATCGC CCTGATAGAC GGTTTTTCGC 312 0 

CCTTTGACGT TGGAGTCCAC GTTCTTTAAT AGTGGACTCT TGTTC C AAAC TGGAACAACA 3180 

CTCAACCCTA TCTCGGTCTA TTCTTTTGAT TTATAAGGGA TTTTGGGGAT TTCGGCCTAT 324 0 

TGGTTAAAAA ATGAGCTGAT TTAACAAAAA TTTAACGCGA ATTAATTCTG TGGAATGTGT 3300 

GTCAGTTAGG GTGTGGAAAG TCCCCAGGCT CCCCAGGCAG GCAGAAGTAT GCAAAGCATG 336 0 

CATCTCAATT AGTCAGCAAC CAGGTGTGGA AAGTCCCCAG GCTCCCCAGC AGGCAGAAGT 342 0 

ATGCAAAGCA TGCATCTCAA TTAGTCAGCA ACCATAGTCC CGCCCCTAAC TCCGCCCATC 3480 

CCGCCCCTAA CTCCGCCCAG TTCCGCCCAT TCTCCGCCCC ATGGCTGACT AATTTTTTTT 3540 

ATTTATGCAG AGGCCGAGGC CGCCTCTGCC TCTGAGCTAT TCCAGAAGTA GTGAGGAGGC 3600 



TTTTTTGGAG GCCTAGGCTT TTGCAAAAAG 
TCTGATCAAG AGACAGGATG AGGATCGTTT 
GGTTCTCCGG CCGCTTGGGT GGAGAGGCTA 
GGCTGCTCTG ATGCCGCCGT GTTCCGGCTG 
AAGACCGACC TGTCCGGTGC CCTGAATGAA 
CTGGCCACGA CGGGCGTTCC TTGCGCAGCT 
GACTGGCTGC TATTGGGCGA AGTGCCGGGG 
GCCGAGAAAG TATCCATCAT GGCTGATGCA 
ACCTGCCCAT TCGACCACCA AGCGAAACAT 
GCCGGTCTTG TCGATCAGGA TGATCTGGAC 
CTGTTCGCCA GGCTCAAGGC GCGCATGCCC 
GATGCCTGCT TGCCGAATAT CATGGTGGAA 
GGCCGGCTGG GTGTGGCGGA CCGCTATCAG 
GAAGAGCTTG GCGGCGAATG GGCTGACCGC 
GATTCGCAGC GCATCGCCTT CTATCGCCTT 
GGTTCGAAAT GACCGAC CAA GCGACGCCCA 
CCGCCTTCTA TGAAAGGTTG GGCTTCGGAA 
TCCAGCGCGG GGATCTCATG CTGGAGTTCT 
ATAATGGTTA CAAATAAAGC AATAGCATCA 
TGCATTCTAG TTGTGGTTTG TCCAAACTCA 
CGACCTCTAG CTAGAGCTTG GCGTAATCAT 
ATCCGCTCAC AATTCCACAC AACATACGAG 
CCTAATGAGT GAGCTAACTC ACATTAATTG 
GAAACCTGTC GTGCCAGCTG CATTAATGAA 
GTATTGGGCG CTCTTCCGCT TCCTCGCTCA 
GGCGAGCGGT ATCAGCTCAC TCAAAGGCGG 
ACGCAGGAAA GAACATGTGA GCAAAAGGCC 
CGTTGCTGGC GTTTTTCCAT AGGCTCCGCC 
CAAGTCAGAG GTGGCGAAAC CCGACAGGAC 



CTCCCGGGAG CTTGTATATC CATTTTCGGA 
CGCATGATTG AACAAGATGG ATTGCACGCA 
TTCGGCTATG ACTGGGCACA ACAGACAATC 
TCAGCGCAGG GGCGCCCGGT TCTTTTTGTC 
CTGCAGGACG AGGCAGCGCG GCTATCGTGG 
GTGCTCGACG TTGTCACTGA AGCGGGAAGG 
CAGGATCTCC TGTCATCTCA CCTTGCTCCT 
ATGCGGCGGC TGCATACGCT TGATCCGGCT 
CGCATCGAGC GAGCACGTAC TCGGATGGAA 
GAAGAGCATC AGGGGCTCGC GCCAGCCGAA 
GACGGCGAGG ATCTCGTCGT GACCCATGGC 
AATGGCCGCT TTTCTGGATT CATCGACTGT 
GACATAGCGT TGGCTACCCG TGATATTGCT 
TTCCTCGTGC TTTACGGTAT CGCCGCTCCC 
CTTGACGAGT TCTTCTGAGC GGGACTCTGG 
ACCTGCCATC ACGAGATTTC GATTCCACCG 
TCGTTTTCCG GGACGCCGGC TGGATGATCC 
TCGCCCACCC CAACTTGTTT ATTGCAGCTT 
CAAATTTCAC AAATAAAGCA TTTTTTTCAC 
TCAATGTATC TTATCATGTC TGTATACCGT 
GGTCATAGCT GTTTCCTGTG TGAAATTGTT 
CCGGAAGCAT AAAGTGTAAA GCCTGGGGTG 
CGTTGCGCTC ACTGCCCGCT TTCCAGTCGG 
TCGGCCAACG CGCGGGGAGA GGCGGTTTGC 
CTGACTCGCT GCGCTCGGTC GTTCGGCTGC 
TAATACGGTT ATC CACAGAA TCAGGGGATA 
AGCAAAAGGC CAGGAACCGT AAAAAGGCCG 
CCCCTGACGA G CATC AC AAA AATCGACGCT 
TATAAAGATA CCAGGCGTTT CCCCCTGGAA 



GCTCCCTCGT GCGCTCTCCT GTTCCGACCC TGCCGCTTAC CGGATACCTG TCCGCCTTTC 5400 

TCCCTTCGGG AAGCGTGGCG CTTTCTCAAT GCTCACGCTG TAGGTATCTC AGTTCGGTGT 5460 

AGGTCGTTCG CTCCAAGCTG GGCTGTGTGC ACGAACCCCC CGTTCAGCCC GACCGCTGCG 5520 

CCTTATCCGG TAACTATCGT CTTGAGTCCA ACCCGGTAAG ACACGACTTA TCGCCACTGG 5580 

CAGCAGCCAC TGGTAACAGG ATTAGCAGAG CGAGGTATGT AGGCGGTGCT ACAGAGTTCT 5640 

TGAAGTGGTG GCCTAACTAC GGCTACACTA GAAGGACAGT ATTTGGTATC TGCGCTCTGC 5700 

TGAAGCCAGT TACCTTCGGA AAAAGAGTTG GTAGCTCTTG ATCCGGCAAA CAAACCACCG 5760 

CTGGTAGCGG TGGTTTTTTT GTTTGCAAGC AGCAGATTAC GCGCAGAAAA AAAGGATCTC 582 0 

AAGAAGATCC TTTGATCTTT TCTACGGGGT CTGACGCTCA GTGGAACGAA AACTCACGTT 5880 

AAGGGATTTT GGTCATGAGA TTATCAAAAA GGATCTTCAC CTAGATCCTT TTAAATTAAA 5940 

AATGAAGTTT TAAATCAATC TAAAGTATAT ATGAGTAAAC TTGGTCTGAC AGTTAC CAAT 6 000 

Q 

grj GCTTAATCAG TGAGGCACCT ATCTCAGCGA TCTGTCTATT TCGTTCATCC ATAGTTGCCT 606 0 

m 

Zj GACTCCCCGT CGTGTAGATA ACTACGATAC GGGAGGGCTT ACCATCTGGC CCCAGTGCTG 612 0 

%&? 

« CAATGATACC GCGAGACCCA CGCTCACCGG CTCCAGATTT ATCAGCAATA AACCAGCCAG 6180 

CCGGAAGGGC CGAGCGCAGA AGTGGTCCTG CAACTTTATC CGCCTCCATC CAGTCTATTA 6240 

a ATTGTTGCCG GGAAGCTAGA GTAAGTAGTT CGCCAGTTAA TAGTTTGCGC AACGTTGTTG 63 00 



n 



CCATTGCTAC AGGCATCGTG GTGTCACGCT CGTCGTTTGG TATGGCTTCA TTCAGCTCCG 6360 

GTTCCCAACG ATCAAGG CGA GTTACATGAT CCCCCATGTT GTGCAAAAAA GCGGTTAGCT 642 0 

CCTTCGGTCC TCCGATCGTT GTCAGAAGTA AGTTGGCCGC AGTGTTATCA CTCATGGTTA 64 80 

TGGCAGCACT GCATAATTCT CTTACTGTCA TGCCATCCGT AAGATGCTTT TCTGTGACTG 6540 

GTGAGTACTC AACCAAGTCA TTCTGAGAAT AGTGTATGCG GCGACCGAGT TGCTCTTGCC 66 00 

CGGCGTCAAT ACGGGATAAT ACCGCGCCAC ATAGCAGAAC TTTAAAAGTG CTCATCATTG 6660 

GAAAACGTTC TTCGGGGCGA AAACTCTCAA GGATCTTACC GCTGTTGAGA TCCAGTTCGA 6 720 

TGTAACCCAC TCGTGCACCC AACTGATCTT CAGCATCTTT TACTTTCACC AGCGTTTCTG 6 780 

GGTGAGCAAA AACAGGAAGG CAAAATGCCG CAAAAAAGGG AATAAGGGCG ACACGGAAAT 6840 

GTTGAATACT CATACTCTTC CTTTTTCAAT ATTATTGAAG CATTTATCAG GGTTATTGTC 6 900 

TCATGAGCGG ATACATATTT GAATGTATTT AGAAAAATAA ACAAATAGGG GTTCCGCGCA 6 960 

CATTTCCCCG AAAAGTGCCA CCTGACGTCG ACGGATCGGG 7000 



(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7108 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:38: 
AGATCTCGGC CGCATATTAA GTGCATTGTT CTCGATACCG CTAAGTGCAT TGTTCTCGTT 
AGCTCGATGG ACAAGTGCAT TGTTCTCTTG CTGAAAGCTC GATGGACAAG TGCATTGTTC 
TCTTGCTGAA AGCTCGATGG ACAAGTGCAT TGTTCTCTTG CTGAAAGCTC AGTAC CCGGG 
AGTACCCTCG ACCGCCGGAG TATAAATAGA GGCGCTTCGT CTACGGAGCG ACAATTCAAT 
TCAAACAAGC AAAGTGAACA CGTCGCTAAG CGAAAGCTAA GCAAATAAAC AAGCGCAGCT 
GAACAAGCTA AACAATCTGC AGTAAAGTGC AAGTTAAAGT GAATCAATTA AAAGTAACCA 
GCAAC CAAGT AAATCAACTG CAACTACTGA AATCTGCCAA GAAGTAATTA TTGAATACAA 
GAAGAGAACT CTGAATACTT TCAACAAGTT ACCGAGAAAG AAGAACTCAC ACACAGCTAG 
CGTTTAAACT TAAGCTTGGT ACCGAGCTCG GATCCACTAG TCCAGTGTGG TGGAATTCGG 
CTTGGGATGA CGCCTCCTCC GCCCGGACGT GCCGCCCCCA GCGCACCGCG CGCCCGCGTC 
CCTGGCCCGC CGGCTCGGTT GGGGCTTCCG CTGCGGCTGC GGCTGCTGCT GCTGCTCTGG 
GCGGCCGCCG CCTCCGCCCA GGGCCACCTA AGGAGCGGAC CCCGCATCTT CGCCGTCTGG 
AAAGGCCATG TAGGGCAGGA CCGGGTGGAC TTTGGCCAGA CTGAGCCGCA CACGGTGCTT 
TTCCACGAGC CAGGCAGCTC CTCTGTGTGG GTGGGAGGAC GTGGCAAGGT CTACCTCTTT 
GACTTCCCCG AGGGCAAGAA CGCATCTGTG CGCACGGTGA ATATCGGCTC CACAAAGGGG 
TCCTGTCTGG ATAAGCGGGA CTGCGAGAAC TACATCACTC TCCTGGAGAG GCGGAGTGAG 
GGGCTGCTGG CCTGTGGCAC CAACGCCCGG CACCCCAGCT GCTGGAACCT GGTGAATGGC 
ACTGTGGTGC CACTTGGCGA GATGAGAGGC TACGCCCCCT TCAGCCCGGA CGAGAACTCC 
CTGGTTCTGT TTGAAGGGGA CGAGGTGTAT TCCACCATCC GGAAGCAGGA ATACAATGGG 
AAGATCCCTC GGTTCCGCCG CATCCGGGGC GAGAGTGAGC TGTACACCAG TGATACTGTC 
ATGCAGAACC CACAGTTCAT CAAAGCCACC ATCGTGCACC AAGACCAGGC TTACGATGAC 



w 

yl 



or. 

o 



AAGATCTACT ACTTCTTCCG AGAGGACAAT CCTGACAAGA ATCCTGAGGC TCCTCTCAAT 132 0 

i 

GTGTCCCGTG TGGCCCAGTT GTGCAGGGGG GACCAGGGTG GGGAAAGTTC ACTGTCAGTC 1380 

TCCAAGTGGA ACACTTTTCT GAAAGCCATG CTGGTATGCA GTGATGCTGC CACCAACAAG 1440 

AACTTCAACA GGCTGCAAGA CGTCTTCCTG CTCCCTGACC CCAGCGGCCA GTGGAGGGAC 1500 

ACCAGGGTCT ATGGTGTTTT CTCCAACCCC TGGAACTACT CAGCCGTCTG TGTGTATTCC 156 0 

CTCGGTGACA TTGACAAGGT CTTCCGTACC TCCTCACTCA AGGGCTACCA CTCAAGCCTT 162 0 

CCCAACCCGC GGCCTGGCAA GTGCCTCCCA GACCAGCAGC CGATACCCAC AGAGACCTTC 168 0 

CAGGTGGCTG ACCGTCACCC AGAGGTGGCG CAGAGGGTGG AGCCCATGGG GCCTCTGAAG 174 0 

ACGCCATTGT TCCACTCTAA ATACCACTAC CAGAAAGTGG CCGTTCACCG CATGCAAGCC 1800 

AGCCACGGGG AGACCTTTCA TGTGCTTTAC CTAACTACAG ACAGGGGCAC TATC CACAAG 1860 

GTGGTGGAAC CGGGGGAGCA GGAGCACAGC TTCGCCTTCA ACATCATGGA GATCCAGCCC 192 0 

TTCCGCCGCG CGGCTGCCAT CCAGACCATG TCGCTGGATG CTGAGCGGAG GAAGCTGTAT 198 0 

GTGAGCTCCC AGTGGGAGGT GAGCCAGGTG CCCCTGGACC TGTGTGAGGT CTATGGCGGG 2 04 0 

GGCTGCCACG GTTGCCTCAT GTCCCGAGAC CCCTACTGCG GCTGGGACCA GGGCCGCTGC 2100 

ATCTCCATCT ACAGCTCCGA ACGGTCAGTG CTGCAATCCA TTAATCCAGC CGAGCCACAC 216 0 

AAGGAGTGTC CCAACCCCAA AC C AG AC AAG GCCCCACTGC AGAAGGTTTC CCTGGCCCCA 222 0 

AACTCTCGCT ACTACCTGAG CTGCCCCATG GAATCCCGCC ACGCCACCTA CTCATGGCGC 228 0 

CACAAGGAGA ACGTGGAGCA GAGCTGCGAA CCTGGTCACC AGAGCCCCAA CTGCATCCTG 234 0 

TTCATCGAGA ACCTCACGGC GCAGCAGTAC GGCCACTACT TCTGCGAGGC CCAGGAGGGC 2400 

TCCTACTTCC GCGAGGCTCA GCACTGGCAG CTGCTGCCCG AGGACGGCAT CATGGCCGAG 246 0 

CACCTGCTGG GTCATGCCTG TGCCCTGGCT GCCTCCCTCT GGCTGGGGGT GCTGCCCACA 2520 

CTCACTCTTG GCTTGCTGGT CCACGTGAAG CTTGGGCCCG AACAAAAACT CATCTCAGAA 2580 

GAGGATCTGA ATAGCGCCGT CGACCATCAT CATCATCATC ATTGAGTTTA TCCAGCACAG 2640 

TGGCGGCCGC TCGAGTCTAG AGGGCCCGTT TAAACCCGCT GATCAGCCTC GACTGTGCCT 2 700 

TCTAGTTGCC AGCCATCTGT TGTTTGCCCC TCCCCCGTGC CTTCCTTGAC CCTGGAAGGT 2 760 

GCCACTCCCA CTGTCCTTTC CTAATAAAAT GAGGAAATTG CATCGCATTG TCTGAGTAGG 2 820 

TGTCATTCTA TTCTGGGGGG TGGGGTGGGG CAGGACAGCA AGGGGGAGGA TTGGGAAGAC 2 880 

AATAGCAGGC ATGCTGGGGA TGCGGTGGGC TCTATGGCTT CTGAGGCGGA AAGAAC CAGC 2 940 

TGGGGCTCTA GGGGGTATCC CCACGCGCCC TGTAGCGGCG CATTAAGCGC GGCGGGTGTG 3 000 



AAAGGTTGGG CTTCGGAATC GTTTTCCGGG AGGCCGGCTG GATGATCCTC CAGCGCGGGG 4740 

ATCTCATGCT GGAGTTCTTC GCCCACCCCA ACTTGTTTAT TGCAGCTTAT AATGGTTACA 48 00 

AATAAAGCAA TAGCATCACA AATTTCACAA ATAAAGCATT TTTTTCACTG CATTCTAGTT 4860 

GTGGTTTGTC CAAACTCATC AATGTATCTT ATCATGTCTG TATACCGTCG ACCTCTAGCT 4920 

AGAGCTTGGC GTAATCATGG TCATAGCTGT TTCCTGTGTG AAATTGTTAT CCGCTCACAA 4980 

TTCCACACAA CATACGAGCC GGAAGCATAA AGTGTAAAGC CTGGGGTGCC TAATGAGTGA 5040 

GCTAACTCAC ATTAATTGCG TTGCGCTCAC TGCCCGCTTT CCAGTCGGGA AACCTGTCGT 5100 

GCCAGCTGCA TTAATGAATC GGCCAACGCG CGGGGAGAGG CGGTTTGCGT ATTGGGCGCT 5160 

CTTCCGCTTC CTCGCTCACT GACTCGCTGC GCTCGGTCGT TCGGCTGCGG CGAGCGGTAT 522 0 

CAGCTCACTC AAAGGCGGTA ATACGGTTAT CCACAGAATC AGGGGATAAC GCAGGAAAGA 52 80 

ACATGTGAGC AAAAGGCCAG C AAAAGGC C A GGAACCGTAA AAAGGCCGCG TTGCTGGCGT 5340 

Q TTTTCCATAG GCTCCGCCCC CCTGACGAGC ATCACAAAAA TCGACGCTCA AGTCAGAGGT 5400 

H| GGCGAAACCC GACAGGACTA TAAAGATACC AGGCGTTTCC CCCTGGAAGC TCCCTCGTGC 546 0 

UJ 

jfjj GCTCTCCTGT TCCGACCCTG CCGCTTACCG GATACCTGTC CGCCTTTCTC CCTTCGGGAA 552 0 

H GCGTGGCGCT TTCTCAATGC TCACGCTGTA GGTATCTCAG TTCGGTGTAG GTCGTTCGCT 558 0 

CCAAGCTGGG CTGTGTGCAC GAACCCCCCG TTCAGCCCGA CCGCTGCGCC TTATCCGGTA 564 0 

s 

O ACTATCGTCT TGAGTCCAAC CCGGTAAGAC ACGACTTATC GCCACTGGCA GCAGCCACTG 5700 

cn 

Q GTAACAGGAT TAGCAGAGCG AGGTATGTAG GCGGTGCTAC AGAGTTCTTG AAGTGGTGGC 5760 



CTAACTACGG CTACACTAGA AGGACAGTAT TTGGTATCTG CGCTCTGCTG AAGCCAGTTA 582 0 

CCTTCGGAAA AAGAGTTGGT AGCTCTTGAT CCGGCAAACA AACCACCGCT GGTAGCGGTG 5880 

GTTTTTTTGT TTGCAAGCAG CAGATTACGC GCAGAAAAAA AGGATC TCAA GAAGATCCTT 594 0 

TGATCTTTTC TACGGGGTCT GACGCTCAGT GGAACGAAAA CTCACGTTAA GGGATTTTGG 6000 

TCATGAGATT ATCAAAAAGG ATCTTCACCT AGATCCTTTT AAATTAAAAA TGAAGTTTTA 6060 

AATCAATCTA AAGTATATAT GAGTAAACTT GGTCTGACAG TTACCAATGC TTAATCAGTG 6120 

AGGCACCTAT CTCAGCGATC TGTCTATTTC GTTCATCCAT AGTTGCCTGA CTCCCCGTCG 6180 

TGTAGATAAC TACGATACGG GAGGGCTTAC CATCTGGCCC CAGTGCTGCA ATGATACCGC 6240 

GAGACCCACG CTCACCGGCT CCAGATTTAT CAGCAATAAA CCAGCCAGCC GGAAGGGCCG 63 00 

AGCGCAGAAG TGGTCCTGCA ACTTTATCCG CCTCCATCCA GTCTATTAAT TGTTGCCGGG 6360 

AAGCTAGAGT AAGTAGTTCG CCAGTTAATA GTTTGCGCAA CGTTGTTGCC ATTGCTACAG 6420 



GCATCGTGGT GTCACGCTCG TCGTTTGGTA TGGCTTCATT CAGCTCCGGT TCCCAACGAT 6480 

CAAGGCGAGT TACATGATCC CCCATGTTGT GCAAAAAAGC GGTTAGCTCC TTCGGTCCTC 6 540 

CGATCGTTGT CAGAAGTAAG TTGGCCGCAG TGTTATCACT CATGGTTATG GCAGCACTGC 6600 

ATAATTCTCT TACTGTCATG CCATCCGTAA GATGCTTTTC TGTGACTGGT GAGTACTCAA 6660 

CCAAGTCATT CTGAGAATAG TGTATGCGGC GACCGAGTTG CTCTTGCCCG GCGTCAATAC 6720 

GGGATAATAC CGCGCCACAT AGCAGAACTT TAAAAGTGCT CATCATTGGA AAACGTTCTT 6780 

CGGGGCGAAA ACTCTCAAGG ATCTTACCGC TGTTGAGATC CAGTTCGATG TAACCCACTC 6840 

GTGCACCCAA CTGATCTTCA GCATCTTTTA CTTTCACCAG CGTTTCTGGG TGAGCAAAAA 6900 

CAGGAAGGCA AAATGCCGCA AAAAAGGGAA TAAGGGCGAC ACGGAAATGT TGAATACTCA 696 0 

TACTCTTCCT TTTTCAATAT TATTGAAGCA TTTATCAGGG TTATTGTCTC ATGAGCGGAT 702 0 

ACATATTTGA ATGTATTTAG AAAAATAAAC AAATAGGGGT TCCGCGCACA TTTCCCCGAA 7080 

Q 

*B AAGTGCCACC TGACGTCGAC GGATCGGG 7108 
bJ (2) INFORMATION FOR SEQ ID NO: 39: 

01 

Q (i) SEQUENCE CHARACTERISTICS: 

■^j (A) LENGTH: 4 019 base pairs 

! (B) TYPE: nucleic acid 

^ (C) STRANDEDNESS : single 

" (D) TOPOLOGY: linear 



I "3 



(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 

CTCGAGAAAT CATAAAAAAT TTATTTGCTT TGTGAGCGGA TAACAATTAT AATAGATTCA 60 

ATTGTGAGCG GATAACAATT TCACACAGAA TTCATTAAAG AGGAGAAATT AACTATGAGA 120 

GGATCGCATC ACCATCACCA TCACGGATCC CTGGTTCTGT TTGAAGGGGA CGAGGTGTAT 180 

TCCACCATCC GGAAGCAGGA ATACAATGGG AAGATCCCTC GGTTCCGCCG CATCCGGGGC 240 

GAGAGTGAGC TGTACACCAG TGATACTGTC ATGCAGAACC CACAGTTCAT CAAAGCCACC 3 00 

ATCGTGCACC AAGACCAGGC TTACGATGAC AAGATCTACT ACTTCTTCCG AGAGGACAAT 360 

CCTGACAAGA ATCCTGAGGC TCCTCTCAAT GTGTCCCGTG TGGCCCAGTT GTGCAGGGGG 420 

GACCAGGGTG GGGAAAGTTC ACTGTCAGTC TCCAAGTGGA ACACTTTTCT GAAAGCCATG 480 

CTGGTATGCA GTGATGCTGC CACCAACAAG AACTTCAACA GGCTGCAAGA CGTCTTCCTG 54 0 



CTCCCTGACC CCAGCGGCCA GTGGAGGGAC 
TGGAACTACT CAGCCGTCTG TGTGTATTCC 
TCCTCACTCA AGGGCTACCA CTCAAGCCTT 
GACCAGCAGC CGATACCCAC AGAAAGCTTA 
TCCAGTAATG ACCTCAGAAC TCCATCTGGA 
GTTTTTTATT GGTGAGAATC CAAGCTAGCT 
AAATGGAGAA AAAAATCACT GGATATAC C A 
AACATTTTGA GGCATTTCAG TCAGTTGCTC 
ATATTACGGC CTTTTTAAAG ACCGTAAAGA 
TTCACATTCT TGCCCGCCTG ATGAATGCTC 
GTGAGCTGGT GATATGGGAT AGTGTTCACC 
AAACGTTTTC ATCGCTCTGG AGTGAATACC 
ATTCGCAAGA TGTGGCGTGT TACGGTGAAA 
AGAATATGTT TTTCGTCTCA GCCAATCCCT 
TGGCCAATAT GGACAACTTC TTCGCCCCCG 
GCGACAAGGT GCTGATGCCG CTGGCGATTC 
ATGTCGGCAG AATGCTTAAT GAATTACAAC 
AATTTTTTTA AGGCAGTTAT TGGTGCCCTT 
AGGCATCAAA TAAAACGAAA GGCTCAGTCG 
TTGTCGGTGA ACGCTCTCCT GAGTAGGACA 
TTCGGTGATG ACGGTGAAAA CCTCTGACAC 
CTGTAAGCGG ATGCCGGGAG CAGACAAGCC 
TGTCGGGGCG CAGCCATGAC CCAGTCACGT 
ATGCGGCATC AGAGCAGATT GTACTGAGAG 
GATGCGTAAG GAGAAAATAC CGCATCAGGC 
TGCGCTCGGT CTGTCGGCTG CGGCGAGCGG 
TATCCACAGA ATCAGGGGAT AACGCAGGAA 
CCAGGAACCG TAAAAAGGCC GCGTTGCTGG 



ACCAGGGTCT ATGGTGTTTT CTCCAACCCC 
CTCGGTGACA TTGACAAGGT CTTCCGTACC 
CCCAACCCGC GGCCTGGCAA GTGCCTCCCA 
ATTAGCTGAG CTTGGACTCC TGTTGATAGA 
TTTGTTCAGA ACGCTCGGTT GCCGCCGGGC 
TGGCGAGATT TTCAGGAGCT AAGGAAGCTA 
CCGTTGATAT ATCCCAATGG CATCGTAAAG 
AATGTACCTA TAAC CAGACC GTTCAGCTGG 
AAAATAAGCA CAAGTTTTAT CCGGCCTTTA 
ATCCGGAATT TCGTATGGCA ATGAAAGACG 
CTTGTTACAC CGTTTTCCAT GAGCAAACTG 
ACGACGATTT CCGGCAGTTT CTACACATAT 
ACCTGGCCTA TTTCCCTAAA GGGTTTATTG 
GGGTGAGTTT CACCAGTTTT GATTTAAACG 
TTTTCACCAT GGGCAAATAT TATACGCAAG 
AGGTTCATCA TGCCGTCTGT GATGGCTTCC 
AGTACTGCGA TGAGTGGCAG GGCGGGGCGT 
AAACGCCTGG GGTAATGACT CTCTAGCTTG 
AAAGACTGGG CCTTTCGTTT TATCTGTTGT 
AATCCGCCGC TCTAGAGCTG CCTCGCGCGT 
ATGCAGCTCC CGGAGACGGT CACAGCTTGT 
CGTCAGGGCG CGTCAGCGGG TGTTGGCGGG 
AGCGATAGCG GAGTGTATAC TGGCTTAACT 
TGCACCATAT GCGGTGTGAA ATACCGCACA 
GCTCTTCCGC TTCCTCGCTC ACTGACTCGC 
TATCAGCTCA CTCAAAGGCG GTAATACGGT 
AGAACATGTG AGCAAAAGGC CAGCAAAAGG 
CGTTTTTCCA TAGGCTCCGC CCCCCTGACG 



AGCATCACAA AAATCGACGC TCAAGTCAGA 
ACCAGGCGTT TCCCCCTGGA AGCTCCCTCG 
CCGGATACCT GTCCGCCTTT CTCCCTTCGG 
GTAGGTATCT CAGTTCGGTG TAGGTCGTTC 
CCGTTCAGCC CGACCGCTGC GCCTTATCCG 
GACACGACTT ATCGCCACTG GCAGCAGCCA 
TAGGCGGTGC TACAGAGTTC TTGAAGTGGT 
TATTTGGTAT CTGCGCTCTG CTGAAGCCAG 
GATCCGGCAA ACAAACCACC GCTGGTAGCG 
CGCGCAGAAA AAAAGGATCT CAAGAAGATC 
AGTGGAACGA AAACTCACGT TAAGGGATTT 
CCTAGATCCT TTTAAATTAA AAATGAAGTT 
CTTGGTCTGA CAGTTACCAA TGCTTAATCA 
TTCGTTCATC CATAGCTGCC TGACTCCCCG 
TACCATCTGG CCCCAGTGCT GCAATGATAC 
TATCAGCAAT AAACCAGCCA GCCGGAAGGG 
CCGCCTCCAT CCAGTCTATT AATTGTTGCC 
ATAGTTTGCG CAACGTTGTT GCCATTGCTA 
GTATGGCTTC ATTCAGCTCC GGTTCCCAAC 
TGTGCAAAAA AGCGGTTAGC TCCTTCGGTC 
CAGTGTTATC ACTCATGGTT ATGGCAGCAC 
TAAGATGCTT TTCTGTGACT GGTGAGTACT 
GGCGACCGAG TTGCTCTTGC CCGGCGTCAA 
CTTTAAAAGT GCTCATCATT GGAAAACGTT 
CGCTGTTGAG ATCCAGTTCG ATGTAACCCA 
TTACTTTCAC CAGCGTTTCT GGGTGAGCAA 
GAATAAGGGC GACACGGAAA TGTTGAATAC 
GCATTTATCA GGGTTATTGT CTCATGAGCG 
AACAAATAGG GGTTCCGCGC ACATTTCCCC 



GGTGGCGAAA CCCGACAGGA CTATAAAGAT 
TGCGCTCTCC TGTTCCGACC CTGCCGCTTA 
GAAGCGTGGC GCTTTCTCAA TGCTCACGCT 
GCTCCAAGCT GGGCTGTGTG CACGAACCCC 
GTAACTATCG TCTTGAGTCC AACCCGGTAA 
CTGGTAACAG GATTAGCAGA GCGAGGTATG 
GGCCTAACTA CGGCTACACT AGAAGGACAG 
TTACCTTCGG AAAAAGAGTT GGTAGCTCTT 
GTGGTTTTTT TGTTTGCAAG CAGCAGATTA 
CTTTGATCTT TTCTACGGGG TCTGACGCTC 
TGGTCATGAG ATTATCAAAA AGGATCTTCA 
TTAAATCAAT CTAAAGTATA TATGAGTAAA 
GTGAGGCACC TATCTCAGCG ATCTGTCTAT 
TCGTGTAGAT AACTACGATA CGGGAGGGCT 
CGCGAGACCC ACGCTCACCG GCTCCAGATT 
CCGAGCGCAG AAGTGGTCCT GCAACTTTAT 
GGGAAGCTAG AGTAAGTAGT TCGCCAGTTA 
CAGGCATCGT GGTGTCACGC TCGTCGTTTG 
GATCAAGGCG AGTTACATGA TCCCCCATGT 
CTCCGATCGT TGTCAGAAGT AAGTTGGCCG 
TGCATAATTC TCTTACTGTC ATGCCATCCG 
CAACCAAGTC ATTCTGAGAA TAGTGTATGC 
TACGGGATAA TACCGCGCCA CATAGCAGAA 
CTTCGGGGCG AAAACTCTCA AGGATCTTAC 
CTCGTGCACC CAACTGATCT. TCAGCATCTT 
AAACAGGAAG GCAAAATGCC GCAAAAAAGG 
TCATACTCTT CCTTTTTCAA TATTATTGAA 
GATACATATT TGAATGTATT TAGAAAAATA 
GAAAAGTGCC ACCTGACGTC TAAGAAACCA 



TTATTATCAT GACATTAACC TATAAAAATA GGCGTATCAC GAGGCCCTTT CGTCTTCAC 4019 
(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 999 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:40: ( 

CTCGAGAAAT CATAAAAAAT TTATTTGCTT TGTGAGCGGA TAACAATTAT AATAGATTCA 60 

ATTGTGAGCG GATAACAATT TCACACAGAA TTCATTAAAG AGGAGAAATT AACTATGAGA 120 

O GGATCGCATC ACCATCACCA TCACACGGAT CCGCATGCGA GCTCCCAGTG GGAGGTGAGC 180 

yj CAGGTGCCCC TGGACCTGTG TGAGGTCTAT GGCGGGGGCT GCCACGGTTG CCTCATGTCC 240 

u 

jjj CGAGACCCCT ACTGCGGCTG GGACCAGGGC CGCTGCATCT CCATCTACAG CTCCGAACGG 300 

rl 

£1 TCAGTGCTGC AATCCATTAA TCCAGCCGAG CCACACAAGG AGTGTCCCAA CCCCAAACCA 360 



GACAAGGCCC CACTGCAGAA GGTTTCCCTG GCCCCAAACT CTCGCTACTA CCTGAGCTGC 420 
CCCATGGAAT CCCGCCACGC CACCTACTCA TGGCGCCACA AGGAGAACGT GGAGCAGAGC 480 



p TGCGAACCTG GTCACCAGAG CCCCAACTGC ATCCTGTTCA TCGAGAACCT CACGGCGCAG 54 0 

L=3 



CAGTACGGCC ACTACTTCTG CGAGGCCCAG GAGGGCTCCT ACTTCCGCGA GGCTCAGCAC 600 

TGGCAGCTGC TGCCCGAGGA CGGCATCATG GCCGAGCACC TGCTGGGTCA TGCCTGTGCC 66 0 

CTGGCTGCCT CCCTCTGGCT GGGGGTGCTG CCCACACTCA CTCTTGGCTT GCTGGTCCAC 72 0 

GTGAAGCTTA ATTAGCTGAG CTTGGACTCC TGTTGATAGA TCCAGTAATG ACCTCAGAAC 78 0 

TCCATCTGGA TTTGTTCAGA ACGCTCGGTT GCCGCCGGGC GTTTTTTATT GGTGAGAATC 84 0 

CAAGCTAGCT TGGCGAGATT TTCAGGAGCT AAGGAAGCTA AAATGGAGAA AAAAATCACT 900 

GGATATACCA CCGTTGATAT ATCCCAATGG CATCGTAAAG AACATTTTGA GGCATTTCAG 960 

TCAGTTGCTC AATGTACCTA TAACCAGACC GTTCAGCTGG ATATTACGGC CTTTTTAAAG 1020 

ACCGTAAAGA AAAATAAGCA CAAGTTTTAT CCGGCCTTTA TTCACATTCT TGCCCGCCTG 1080 

ATGAATGCTC ATCCGGAATT TCGTATGGCA ATGAAAGACG GTGAGCTGGT GATATGGGAT 1140 

AGTGTTCACC CTTGTTACAC CGTTTTCCAT GAGCAAACTG AAACGTTTTC ATCGCTCTGG 12 00 



AGTGAATACC ACGACGATTT CCGGCAGTTT CTACACATAT ATTCGCAAGA TGTGGCGTGT 12 60 

TACGGTGAAA ACCTGGCCTA TTTCCCTAAA GGGTTTATTG AGAATATGTT TTTCGTCTCA 1320 

GCCAATCCCT GGGTGAGTTT CACCAGTTTT GATTTAAACG TGGCCAATAT GGACAACTTC 1380 

TTCGCCCCCG TTTTCACCAT GGGCAAATAT TATACGCAAG GCGACAAGGT GCTGATGCCG 1440 

CTGGCGATTC AGGTTCATCA TGCCGTCTGT GATGGCTTCC ATGTCGGCAG AATGCTTAAT 1500 

GAATTACAAC AGTACTGCGA TGAGTGGCAG GGCGGGGCGT AATTTTTTTA AGGCAGTTAT 1560 

TGGTGCCCTT AAACGCCTGG GGTAATGACT CTCTAGCTTG AGGCATCAAA TAAAACGAAA 1620 

GGCTCAGTCG AAAGACTGGG CCTTTCGTTT TATCTGTTGT TTGTCGGTGA ACGCTCTCCT 1680 

GAGTAGGACA AATCCGCCGC TCTAGAGCTG CCTCGCGCGT TTCGGTGATG ACGGTGAAAA 1740 

CCTCTGACAC ATGCAGCTCC CGGAGACGGT CACAGCTTGT CTGTAAGCGG ATGCCGGGAG 1800 

CAGACAAGCC CGTCAGGGCG CGTCAGCGGG TGTTGGCGGG TGTCGGGGCG CAGCCATGAC 186 0 

y3 CCAGTCACGT AGCGATAGCG GAGTGTATAC TGGCTTAACT ATGCGGCATC AGAGCAGATT 1920 

fij GTACTGAGAG TGCACCATAT GCGGTGTGAA ATACCGCACA GATGCGTAAG GAGAAAATAC 1980 

q CGCATCAGGC GCTCTTCCGC TTCCTCGCTC ACTGACTCGC TGCGCTCGGT CTGTCGGCTG 2 040 
SI 

l * CGGCGAGCGG TATCAGCTCA CTCAAAGGCG GTAATACGGT TATCCACAGA ATCAGGGGAT 2100 

M 

5 AACGCAGGAA AGAACATGTG AGCAAAAGG C CAGCAAAAGG CCAGGAACCG TAAAAAGGCC 2160 

p 

WW 

CP GCGTTGCTGG CGTTTTTCCA TAGGCTCCGC CCCCCTGACG AGCATCACAA AAATCGACGC 222 0 

o 

yl TCAAGTCAGA GGTGGCGAAA CCCGACAGGA CTATAAAGAT ACCAGGCGTT TCCCCCTGGA 228 0 



AGCTCCCTCG TGCGCTCTCC TGTTCCGACC CTGCCGCTTA CCGGATACCT GTCCGCCTTT 234 0 

CTCCCTTCGG GAAGCGTGGC GCTTTCTCAA TGCTCACGCT GTAGGTATCT CAGTTCGGTG 240 0 

TAGGTCGTTC GCTCCAAGCT GGGCTGTGTG CACGAACCCC CCGTTCAGCC CGACCGCTGC 246 0 

GCCTTATCCG GTAACTATCG TCTTGAGTCC AACCCGGTAA GACACGACTT ATCGCCACTG 252 0 

GCAGCAGCCA CTGGTAACAG GATTAGCAGA GCGAGGTATG TAGGCGGTGC TACAGAGTTC 2580 

TTGAAGTGGT GGCCTAACTA CGGCTACACT AGAAGGACAG TATTTGGTAT CTGCGCTCTG 2640 

CTGAAGCCAG TTACCTTCGG AAAAAGAGTT GGTAGCTCTT GATCCGGCAA ACAAACCACC 2700 

GCTGGTAGCG GTGGTTTTTT TGTTTGCAAG CAGCAGATTA CGCGCAGAAA AAAAGGATCT 276 0 

CAAGAAGATC CTTTGATCTT TTCTACGGGG TCTGACGCTC AGTGGAACGA AAACTCACGT 2820 

TAAGGGATTT TGGTCATGAG ATTATCAAAA AGGATCTTCA CCTAGATCCT TTTAAATTAA 2 880 



AAATGAAGTT TTAAATCAAT CTAAAGTATA TATGAGTAAA CTTGGTCTGA CAGTTACCAA 
TGCTTAATCA GTGAGGCACC TATCTCAGCG ATCTGTCTAT TTCGTTCATC CATAGCTGCC 
TGACTCCCCG TCGTGTAGAT AACTACGATA CGGGAGGGCT TACCATCTGG CCCCAGTGCT 
GCAATGATAC CGCGAGACCC ACGCTCACCG GCTCCAGATT TATCAGCAAT AAACCAGCCA 
GCCGGAAGGG CCGAGCGCAG AAGTGGTCCT GCAACTTTAT CCGCCTCCAT CCAGTCTATT 
AATTGTTGCC GGGAAGCTAG AGTAAGTAGT TCGCCAGTTA ATAGTTTGCG CAACGTTGTT 
GCCATTGCTA CAGGCATCGT GGTGTCACGC TCGTCGTTTG GTATGGCTTC ATTCAGCTCC 
GGTTCCCAAC GATCAAGGCG AGTTACATGA TCCCCCATGT TGTGCAAAAA AGCGGTTAGC 
TCCTTCGGTC CTCCGATCGT TGTCAGAAGT AAGTTGGCCG CAGTGTTATC ACTCATGGTT 
ATGGCAGCAC TGCATAATTC TCTTACTGTC ATGCCATCCG TAAGATGCTT TTCTGTGACT 
GGTGAGTACT CAAC CAAGTC ATTCTGAGAA TAGTGTATGC GGCGACCGAG TTGCTCTTGC 
CCGGCGTCAA TACGGGATAA TACCGCGCCA CATAGCAGAA CTTTAAAAGT GCTCATCATT 
GGAAAACGTT CTTCGGGGCG AAAACTCTCA AGGATCTTAC CGCTGTTGAG ATCCAGTTCG 
ATGTAACCCA CTCGTGCACC CAACTGATCT TCAGCATCTT TTACTTTCAC CAGCGTTTCT 
GGGTGAGCAA AAACAGGAAG GCAAAATGCC GCAAAAAAGG GAATAAGGGC GACACGGAAA 
TGTTGAATAC TCATACTCTT CCTTTTTCAA TATTATTGAA GCATTTATCA GGGTTATTGT 
CTCATGAGCG GATACATATT TGAATGTATT TAGAAAAATA AACAAATAGG GGTTCCGCGC 
ACATTTCCCC GAAAAGTGCC ACCTGACGTC TAAGAAACCA TTATTATCAT GACATTAACC 
TATAAAAATA GGCGTATCAC GAGGCCCTTT CGTCTTCAC 
(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8888 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 
GAGCCGCACA CGGTGCTTTT CCACGAGCCA GGCAGCTCCT CTGTGTGGGT GGGAGGACGT 
GGCAAGGTCT ACCTCTTTGA CTTCCCCGAG GGCAAGAACG CATCTGTGCG CACGGTGAGC 



UJ 

m 



CTCTCTCTTC CCCCAACACC CCCCCTACCC TCTTATCTCC CCTCTGGCCC TGCCAAGGGT 180 

CCTCAGGGAA TCCGAGGGAG CTGGCTTCTC TTCCTAAACT GCCCCCACCT CCGTATCCTA 24 0 

TAAATGGCTC CTGGGGGAGG CTCCCTAAAG GTAGTCCAGA TTGGAGTGGG GAGCTGGGGC 300 

GGTGTGGAGA AAAACAGGAG CTAATGGGCC TGGCCAGCTG GGCAGCGCTG CTGCGGAAAG 360 

CCCAGGCTGG AAGCTGGGCC CCAGAGCCCA TGCCTGGTCT TCTGAACCCT CTGGGCCTCA 420 

GCTCTGGATA TGAGACCCTG TTTGACCTCA GGTAGATCAC TCACCCTCTC AGAGCCCCAG 4 80 

TTGCTCATCT GTCAGATGAG AATAATGGTT GCTTCCTTTG GGGCTTATCC TGAGGCTGTG 540 

TGGAAAGCAT TTCAGGGGTA CCTCACCCCT GGCAGATTGA ACTAATGCTT CTCCCCTTCC 6 00 

CCAGGTGAAT ATCGGCTCCA CAAAGGGGTC CTGTCTGGAT AAGCGGGTGA GCGGGGGAGG 660 

GATCTGGAGG GGTCTGAGCC ACTTGGTAAA GGGAGAGGAG ACCCTGAGGG TCTAAGGAAG 720 

GAAGCATGGC CCTGCCCCAC GAGTCCCAGA CTGATGGGGA GACGTGGTCC TCTGTGCTTA 780 

GGGGATGGCG TCAGCTGCAC ACACTCTGGG CTGTCCCGGG AGGCTGTCAC CTATGCTAAG 840 

CCCTTCTGAC ACCTTCTTCC CTGATCCTGG GGGTCCTAGT GCTAGGCTTG CCAGGGCCTT 900 

CCAGCAACCA ATTTCTCTCC TCCCTTCTCT CTTCCCCGGG CAGGACTGCG AGAACTACAT 96 0 

7"! CACTCTCCTG GAGAGGCGGA GTGAGGGGCT GCTGGCCTGT GGCACCAACG CCCGGCACCC 102 0 

N CAGCTGCTGG AAC CTGGTG A GAAGGCTGCT CCCCATGTGC CTGATCAGCT CACCTTCTAC 1080 

O TGCGTGGGCT TCTGCCCCTC ATGGTGGGAA GGAGATGGCG AGACTCCAAT GCTGGCCTTG 114 0 

On 

f*j CCCTGGGAGG ATGGGGCTCC TGGCCGAGAA ACTGGCCGTC ATGGGAGGCA GTGGCTGTGG 1200 

few 

m 

«z» - GATTATGTGG CCATCCAACC CTCTGGATCT CCCACAGGTG AATGGCACTG TGGTGCCACT 126 0 

^ TGGCGAGATG AGAGGCTACG CCCCCTTCAG CCCGGACGAG AACTCCCTGG TTCTGTTTGA 132 0 

AGGTTGGGGC ATGCTTCGGA ACTGGGCTGG GAGCAGGATG GTCAGCTCTT TGTCCAGTGT 13 80 

CCGGAGGAGG GACTTCCAGG AGCTGCCTGC CCTTACTCAT TTCTCCCTCC CACTGACCCC 144 0 

AGGGGACGAG GTGTATTCCA CCATCCGGAA GCAGGAATAC AATGGGAAGA TCCCTCGGTT 1500 

CCGCCGCATC CGGGGCGAGA GTGAGCTGTA CAC CAGTGAT ACTGTCATGC AGAGTGAGTC 156 0 

AGGCTCCGGC TGGGCTGAGG GTGGGCAAGG GGGTGTGAGC ACTTAAGGTG GCAGATGGGA 1620 

TCCTGATGTT TCTGGGAGGG CTCCCTGAGG GCCGCTGGGG CCATGCAGGA AAGCAGGACC 1680 

TTGGTATAGG CCTGAGAAGT TAGGGTTGGC TGGGAGCAGA GGAACAGACA AGGTATAGCA 1740 

GTGGGATGGG CCCAGCCCTC TTCAGGAACA CAAACAGAGG GAGCCCCAGA CCCAGTGCAG 1800 

GGTCCCCAGG AGCCAAAGTT TATCCTCTGC TGAGTTCACG TGGAGGCAGC CCCCCAACTC 1860 



TAGAGTGGAG GAAGC CAAGA CCCTGCTCTG TGGCTCCTGG GTGAGTGGGT CCCCCAGGCT 3600 

GGGAAGGGGT TGGGGGTCTG GCCTCCTGGG GCATCAGCAC CCCACAGCCT GTGCCCAGGG 3660 

AGGGCTAGAG AACTGCTCAG CCTATGATGG GGTTCCTCCT GCCTTGGGGT TGGGTAGAGC 3 72 0 

AGATGGCCTC TAGACTCAGT GATTCTGTAA CAGGATACAA GTTTGTGGTT TTAAATTGCA 3 780 

GCACAAAGAA ATTAGGCTGA ACTCCTCTCC TTCCTCCTCT CCATCCCTCC CCATTTTCAG 3 840 

TGGTGGTTGG CAACTCAGTG CCAGGCACAA GGCTGGCCTG GGTGAGTGGA GGTGGATGGG 3 900 

TGGGTTCTGG GCCCCCCATT GAGCTGGTCT CCATGTCACT GCAGGAACTA CTCAGCCGTC 3 960 

TGTGTGTATT CCCTCGGTGA CATTGACAAG GTCTTCCGTA CCTCCTCACT CAAGGGCTAC 4 020 

CACTCAAGCC TTCCCAACCC GCGGCCTGGC AAGGTGAGCG TGACAC CAGC CGTGGCCCAG 4 080 

GCCCAGCCCT CCTTCTGCCT CACCTCCCAC CACCCCACTG ACCTGGGCCT GCTCTCCTTG 4140 

CCCAGTGCCT CCCAGACCAG CAGCCGATAC C C AC AG AGAC CTTCCAGGTG GCTGACCGTC 4200 

O ACCCAGAGGT GGCGCAGAGG GTGGAGCCCA TGGGGCCTCT GAAGACGCCA TTGTTCCACT 4260 

gQ CTAAATACCA CTACCAGAAA GTGGCCGTCC ACCGCATGCA AGCCAGCCAC GGGGAGACCT 4320 

TTCATGTGCT TTACCTAACT ACAGGTGAGA GGCTACCCCG GGACCCTCAG TTTGCTTTGT 438 0 

AAAAACGGGC ATGAAAGGTG TAAGGAATAA TGTAGTTAAC ATCTGGTTGG ATCTTTACAT 444 0 

GTGGAAGGAA TAATTGAGTG ACTGGAGTTG TCAGGGGTTA ATGTGTGTGG GTGTGGAAGA 4500 

GCCAGGCAGG GAGAGCTTCC TGGAGGAGGT AGGGGCAAGA GGGAAAGGGG GATGGGAGAA 456 0 

AAGCAAGCAC TGGGATTTGG AGGCGGAAAT CTGGAGAGTC TGAGCAAAGC CAGGTGCACC 462 0 

TTTGGTCCAG ATGTCTGACT CAGGGAAGAA GATGGTAGGA AGAGACGTGG CAAATGAGGA 46 8 0 

GGAGGGGCCT GAACCACAGG GATACTGGCC TCTGCCAGGC AGAATGAGGG AGTCAGGCCC 4 740 

TGCGCCTGTC TTTGGGATTG TGCAGGTGAG AAGAAACATT TGAGGAGTTG ATGGGGCACA 48 00 

AATTAGGTAT GGGGAAGGAG TTCCAGGGGG CAGAACCTTT GCCATCTCAC AGAGGACAGG 4860 

GGCAGCTTCT CTTCTTCCCT GGAGTAGGCC CTGCTGGGGG AAGCTGGGTG GAATGCCGTG 4920 

GGAGATGCTC CTGCTTTCTG GAAAGCCACA GGACACGGAG GAGCCAGTCC TGAGTTGGGT 4980 

TTGTCGCAGC TTCCCATGCC AGCTGCCTTC CTTGAGACTG GAAAGGGCCT CTAGCACCCC 5040 

TGGGGCCATT CAATTCAGGC CCAGGCGCCC AACCTCAGTT GTTCACATTC CCCATGTGAT 5100 

CTCCTGTTGC TGCTTCACCT TGGGACTGTC TCGGCTTTGG TGACCTTGTA GGAAACTGGA 516 0 

ACCCCAGCAC CATTGTTTGG CTCCTGGAAG CCTTGGGGAG AGGAATTTCC CACAGGGCAG 522 0 

GGCCTGGGTC CTGATTCCCT GCCTCTTTAC TCCCTATTCA TCCCGGCTAC ACCCTTGGGC 5280 



O 
M 



01 



TCTCTCATGG CTGGGTGGCT GGTGTTCCTG AAGACCCAGG GCTACCCTCT GTCCAGCCCT 
GTCCTCTGCA GCTCCCTCTC TGGTCCTGGG TCCCACAGGA CAGCCGCCTT GCATGTTTAT 

TGAAGGATGT TTGCTTTCCG GACGGAAGGA CGGAAAAAGC TCTGAAAAAA AAAAAAAAAA 

f 

AAAAAAAA 

(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6622 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:42: 
GATATCATGG AGATAATTAA AATGATAACC ATCTCGCAAA TAAATAAGTA TTTTACTGTT 
TTCGTAACAG TTTTGTAATA AAAAAACCTA TAAATATGAA ATTCTTAGTC AACGTTGCCC 
TTGTTTTTAT GGTCGTATAC ATTTCTTACA TCTATGCGGA TCGATGGGGA TCCGCCCAGG 
GCCACCTAAG GAGCGGACCC CGCATCTTCG CCGTCTGGAA AGGCCATGTA GGGCAGGACC 
GGGTGGACTT TGGCCAGACT GAGCCGCACA CGGTGCTTTT CCACGAGCCA GGCAGCTCCT 
CTGTGTGGGT GGGAGGACGT GGCAAGGTCT ACCTCTTTGA CTTCCCCGAG GGCAAGAACG 
CATCTGTGCG CACGGTGAAT ATCGGCTCCA CAAAGGGGTC CTGTCTGGAT AAGCGGGACT 
GCGAGAACTA CATCACTCTC CTGGAGAGGC GGAGTGAGGG GCTGCTGGCC TGTGGCACCA 
ACGCCCGGCA CCCCAGCTGC TGGAAC CTGG TGAATGGCAC TGTGGTGCCA CTTGGCGAGA 
TGAGAGGCTA TGCCCCCTTC AGCCCGGACG AGAACTCCCT GGTTCTGTTT GAAGGGGACG 
AGGTGTATTC CACCATCCGG AAGCAGGAAT ACAATGGGAA GATCCCTCGG TTCCGCCGCA 
TCCGGGGCGA GAGTGAGCTG TACAC CAGTG ATACTGTCAT GCAGAACCCA CAGTTCATCA 
AAGCCACCAT CGTGCACCAA GACCAGGCTT ACGATGACAA GATCTACTAC TTCTTCCGAG 
AGGACAATCC TGACAAGAAT CCTGAGGCTC CTCTCAATGT GTCCCGTGTG GCCCAGTTGT 
GCAGGGGGGA CCAGGGTGGG GAAAGTTCAC TGTCAGTCTC CAAGTGGAAC ACTTTTCTGA 
AAGCCATGCT GGTATGCAGT GATGCTGCCA CCAACAAGAA CTTCAACAGG CTGCAAGACG 
TCTTCCTGCT CCCTGACCCC AGCGGCCAGT GGAGGGACAC CAGGGTCTAT GGTGTTTTCT 



W 2 

p 



CCAACCCCTG GAACTACTCA GCCGTCTGTG TGTATTCCCT CGGTGACATT GACAAGGTCT 1080 

TCCGTACCTC CTCACTCAAG GGCTACCACT CAAGCCTTCC CAACCCGCGG CCTGGCAAGT 1140 

GCCTCCCAGA CCAGCAGCCG ATACCCACAG AGACCTTCCA GGTGGCTGAC CGTCACCCAG 1200 

AGGTGGCGCA GAGGGTGGAG CCCATGGGGC CTCTGAAGAC GCCATTGTTC CACTCTAAAT 1260 

ACCACTACCA GAAAGTGGCC GTTCACCGCA TGCAAGCCAG CCACGGGGAG ACCTTTCATG 1320 

TGCTTTACCT AACTACAGAC AGGGGCACTA TCCACAAGGT GGTGGAACCG GGGGAGCAGG 1380 

AGCACAGCTT CGCCTTCAAC ATCATGGAGA TCCAGCCCTT CCGCCGCGCG GCTGCCATCC 1440 

AGACCATGTC GCTGGATGCT GAGCGGAGGA AGCTGTATGT GAGCTCCCAG TGGGAGGTGA 1500 

GCCAGGTGCC CCTGGACCTG TGTGAGGTCT ATGGCGGGGG CTGCCACGGT TGCCTCATGT 1560 

CCCGAGACCC CTACTGCGGC TGGGACCAGG GCCGCTGCAT CTCCATCTAC AGCTCCGAAC 1620 

GGTCAGTGCT GCAATCCATT AATCCAGCCG AGCCACACAA GGAGTGTCCC AACCCCAAAC 1680 

CAGACAAGGC CCCACTGCAG AAGGTTTCCC TGGCCCCAAA CTCTCGCTAC TACCTGAGCT 1740 

GCCCCATGGA ATCCCGCCAC GCCACCTACT CATGGCGCCA CAAGGAGAAC GTGGAGCAGA 18 00 

GCTGCGAACC TGGTCACCAG AGCCCCAACT GCATCCTGTT CATCGAGAAC CTCACGGCGC 1860 

AGCAGTACGG CCACTACTTC TGCGAGGCCC AGGAGGGCTC CTACTTCCGC GAGGCTCAGC 1920 



m 

W 

□ 

"S3 

s ACTGGCAGCT GCTGCCCGAG GACGGCATCA TGGCCGAGCA CCTGCTGGGT CATGCCTGTG 1980 

to? 

01 CCCTGGCTGC CTGAATTCGA AGCTTGGAGT CGACTCTGCT GAAGAGGAGG AAATTCTCCT 2 040 

TGAAGTTTCC CTGGTGTTCA AAGTAAAGGA GTTTGCACCA GACGCACCTC TGTTCACTGG 2100 

TCCGGCGTAT TAAAACACGA TACATTGTTA TTAGTACATT TATTAAGCGC TAGATTCTGT 2160 

GCGTTGTTGA TTTACAGACA ATTGTTGTAC GTATTTTAAT AATTCATTAA ATTTATAATC 2220 

TTTAGGGTGG TATGTTAGAG CGAAAATCAA ATGATTTTCA GCGTCTTTAT ATCTGAATTT 22 80 

AAATATTAAA TCCTCAATAG ATTTGTAAAA TAGGTTTCGA TTAGTTTCAA ACAAGGGTTG 2340 

TTTTTCCGAA CCGATGGCTG GACTATCTAA TGGATTTTCG CTCAACGCCA CAAAACTTGC 24 00 

CAAATCTTGT AGCAGCAATC TAGCTTTGTC GATATTCGTT TGTGTTTTGT TTTGTAATAA 2460 

AGGTTCGACG TCGTTCAAAA TATTATGCGC TTTTGTATTT CTTTCATCAC TGTCGTTAGT 2520 

GTACAATTGA CTCGACGTAA ACACGTTAAA TAAAGCCTGG ACATATTTAA CATCGGGCGT 258 0 

GTTAGCTTTA TTAGGCCGAT TATCGTCGTC GTCCCAACCC TCGTCGTTAG AAGTTGCTTC 2640 

CGAAGACGAT TTTGCCATAG CCACACGACG CCTATTAATT GTGTCGGCTA ACACGTCCGC 2 7 00 



GATCAAATTT GTAGTTGAGC TTTTTGGAAT TATTTCTGAT TGCGGGCGTT TTTGGGCGGG 2 760 

TTTCAATCTA ACTGTGCCCG ATTTTAATTC AGACAACACG TTAGAAAGCG ATGGTGCAGG 2 820 

CGGTGGTAAC ATTTCAGACG GCAAATCTAC TAATGGCGGC GGTGGTGGAG CTGATGATAA 2 880 

ATCTACCATC GGTGGAGGCG CAGGCGGGGC TGGCGGCGGA GGCGGAGGCG GAGGTGGTGG 2 940 

CGGTGATGCA GACGGCGGTT TAGGCTCAAA TTGTCTCTTT CAGGCAACAC AGTCGGCACC 3 000 

TCAACTATTG TACTGGTTTC GGGCGTATGG TGCACTCTCA GTACAATCTG CTCTGATGCC 3 060 

GCATAGTTAA GCCAGCCCCG ACACCCGCCA ACACCCGCTG ACGCGCCCTG ACGGGCTTGT 3120 

CTGCTCCCGG CATCCGCTTA CAGACAAGCT GTGACCGTCT CCGGGAGCTG CATGTGTCAG 318 0 

AGGTTTTCAC CGTCATCACC GAAACGCGCG AGACGAAAGG GCCTCGTGAT ACGCCTATTT 324 0 

TTATAGGTTA ATGTCATGAT AATAATGGTT TCTTAGACGT CAGGTGGCAC TTTTCGGGGA 3300 

AATGTGCGCG GAACCCCTAT TTGTTTATTT TTCTAAATAC ATTCAAATAT GTATCCGCTC 336 0 

Q ATGAGACAAT AACCCTGATA AATGCTTCAA TAATATTGAA AAAGGAAGAG TATGAGTATT 342 0 

£g CAACATTTCC GTGTCGCCCT TATTCCCTTT TTTGCGGCAT TTTGCCTTCC TGTTTTTGCT 3480 

2 2 

yj 

fn CACCCAGAAA CGCTGGTGAA AGTAAAAGAT GCTGAAGATC AGTTGGGTGC ACGAGTGGGT 354 0 

~j TACATCGAAC TGGATCTCAA CAGCGGTAAG ATCCTTGAGA GTTTTCGCCC CGAAGAACGT 3600 

^ TTTCCAATGA TGAGCACTTT TAAAGTTCTG CTATGTGGCG CGGTATTATC CCGTATTGAC 3660 

a 

O GCCGGGCAAG AGCAACTCGG TCGCCGCATA CACTATTCTC AGAATGACTT GGTTGAGTAC 372 0 

01 

R TCACCAGTCA CAGAAAAGCA TCTTACGGAT GGCATGACAG TAAGAGAATT ATGCAGTGCT 3780 

USSB" 

GCCATAACCA TGAGTGATAA CACTGCGGCC AACTTACTTC TGACAACGAT CGGAGGACCG 3840 

AAGGAGC TAA CCGCTTTTTT G C AC AACATG GGGGATCATG TAACTCGCCT TGATCGTTGG 3900 

GAACCGGAGC TGAATGAAGC CATACCAAAC GACGAGCGTG ACACCACGAT GCCTGTAGCA 3 960 

ATGGCAACAA CGTTGCGCAA ACTATTAACT GGCGAACTAC TTACTCTAGC TTCCCGGCAA 4020 

CAATTAATAG ACTGGATGGA GGCGGATAAA GTTGCAGGAC CACTTCTGCG CTCGGCCCTT 4 080 

CCGGCTGGCT GGTTTATTGC TGATAAATCT GGAGCCGGTG AGCGTGGGTC TCGCGGTATC 4140 

ATTGCAGCAC TGGGGCCAGA TGGTAAGCCC TCCCGTATCG TAGTTATCTA CACGACGGGG 42 00 

AGTCAGGCAA CTATGGATGA ACGAAATAGA CAGATCGCTG AGATAGGTGC CTCACTGATT 4260 

AAGCATTGGT AACTGTCAGA CCAAGTTTAC TCATATATAC TTTAGATTGA TTTAAAACTT 432 0 

CATTTTTAAT TTAAAAGGAT CTAGGTGAAG ATCCTTTTTG ATAATCTCAT GACCAAAATC 4 380 

CCTTAACGTG AGTTTTCGTT CCACTGAGCG TCAGACCCCG TAGAAAAGAT CAAAGGATCT 444 0 



M 5 



TCTTGAGATC CTTTTTTTCT GCGCGTAATC TGCTGCTTGC AAACAAAAAA ACCACCGCTA 4 500 

CCAGCGGTGG TTTGTTTGCC GGATCAAGAG CTACCAACTC TTTTTCCGAA GGTAACTGGC 4 560 

TTCAGCAGAG CGCAGATACC AAATACTGTT CTTCTAGTGT AGCCGTAGTT AGGCCACCAC 4620 

TTCAAGAACT CTGTAGCACC GCCTACATAC CTCGCTCTGC TAATCCTGTT ACCAGTGGCT 468 0 

GCTGCCAGTG GCGATAAGTC GTGTCTTACC GGGTTGGACT CAAGACGATA GTTACCGGAT 4740 

AAGGCGCAGC GGTCGGGCTG AACGGGGGGT TCGTGCACAC AGCCCAGCTT GGAGCGAACG 4800 

ACCTACACCG AACTGAGATA CCTACAGGGT GAGCTATGAG AAAGCGCCAC GCTTCCCGAA 4860 

GGGAGAAAGG CGGACAGGTA TCCGGTAAGC GGCAGGGTCG GAACAGGAGA GCGCACGAGG 4 92 0 

GAGCTTCCAG GGGGAAACGC CTGGTATCTT TATAGTCCTG TCGGGTTTCG CCACCTCTGA 498 0 

CTTGAGCGTC GATTTTTGTG ATGCTCGTCA GGGGGGCGGA GCCTATGGAA AAACGCCAGC 504 0 

^ AACGCGGCCT TTTTACGGTT CCTGGCCTTT TGCTGGCCTT TTGCTCACAT GTTCTTTCCT 5100 

U3 GCGTTATCCC CTGATTCTGT GGATAACCGT ATTACCGCCT TTGAGTGAGC TGATACCGCT 5160 

rg 

yj CGCCGCAGCC GAACGACCGA GCGCAGCGAG TCAGTGAGCG AGGAAGC AT C CTGCACCATC 5220 

p GTCTGCTCAT CCATGACCTG AC C ATGC AG A GGATGATGCT CGTGACGGTT AACGCCTCGA 52 80 

NJ 

• g ATCAGCAACG GCTTGCCGTT CAGCAGCAGC AGACCATTTT CAATCCGCAC CTCGCGGAAA 5340 

CCGACATCGC AGGCTTCTGC TTCAATCAGC GTGCCGTCGG CGGTGTGCAG TTCAACCACC 5400 

BR GCACGATAGA GATTCGGGAT TTCGGCGCTC CACAGTTTCG GGTTTTCGAC GTTCAGACGT 5460 

fjf AGTGTGACGC GATCGGTATA ACCACCACGC TCATCGATAA TTTCACCGCC GAAAGGCGCG 5520 

jlf GTGCCGCTGG CGACCTGCGT TTCACCCTGC CATAAAGAAA CTGTTACCCG TAGGTAGTCA 5580 

s 

CGCAACTCGC CGCACATCTG AACTTCAGCC TCCAGTACAG CGCGGCTGAA ATCATCATTA 564 0 

AAGCGAGTGG CAACATGGAA ATCGCTGATT TGTGTAGTCG GTTTATGCAG CAACGAGACG 5700 

TCACGGAAAA TGCCGCTCAT CCGCCACATA TCCTGATCTT CCAGATAACT GCCGTCACTC 5760 

CAACGCAGCA CCATCACCGC GAGGCGGTTT TCTCCGGCGC GTAAAAATGC GCTCAGGTCA 5820 

AATTCAGACG GCAAACGACT GTCCTGGCCG TAACCGACCC AGCGCCCGTT GCACCACAGA 5880 

TGAAACGCCG AGTTAACGCC AT C AAAAAT A ATTCGCGTCT GGCCTTCCTG TAGCCAGCTT 594 0 

TCATCAACAT TAAATGTGAG CGAGTAACAA CCCGTCGGAT TCTCCGTGGG AACAAACGGC 6000 

GGATTGACCG TAATGGGATA GGTCACGTTG GTGTAGATGG GCGCATCGTA ACCGTGCATC 6060 

TGCCAGTTTG AGGGGACGAC GACAGTATCG GCCTCAGGAA GATCGCACTC CAGCCAGCTT 6120 



TCCGGCACCG CTTCTGGTGC CGGAAACCAG GCAAAGCGCC ATTCGCCATT CAGGCTGCGC 6180 

AACTGTTGGG AAGGGCGATC GGTGCGGGCC TCTTCGCTAT TACGCCAGCT GGCGAAAGGG 6240 

GGATGTGCTG CAAGGCGATT AAGTTGGGTA ACGCCAGGGT TTTCCCAGTC ACGACGTTGT 6300 

AAAACGACGG GATCTATCAT TTTTAGCAGT GATTCTAATT GCAGCTGCTC TTTGATACAA 6360 

CTAATTTTAC GACGACGATG CGAGCTTTTA TTCAACCGAG CGTGCATGTT TGCAATCGTG 6420 

CAAGCGTTAT CAATTTTTCA TTATCGTATT GTTGCACATC AACAGGCTGG ACACCACGTT 6480 

GAACTCGCCG CAGTTTTGCG GCAAGTTGGA CCCGCCGCGC ATCCAATGCA AACTTTCCGA 6540 

CATTCTGTTG CCTACGAACG ATTGATTCTT TGTCCATTGA TCGAAGCGAG TGCCTTCGAC 66 00 

TTTTTCGTGT CCAGTGTGGC TT 6622 

(2) INFORMATION FOR SEQ ID NO:43: 

(i) SEQUENCE CHARACTERISTICS: 
_ (A) LENGTH: 31 base pairs 

W (B) TYPE: nucleic acid 

S (C) STRANDEDNESS : single 

03 (D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



5 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: 

Q 

CD CCGGATCCGC CCAGGGCCAC CTAAGGAGCG G 31 

□ 

§1 (2) INFORMATION FOR SEQ ID NO: 44: 

ft (i) SEQUENCE CHARACTERISTICS: 

* " (A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:44: 
CTGAATTCAG GAGCCAGGGC ACAGGCATG 



29 



TGTCAGGACA AACAGGACTG TGGGAATTAC ATCACTCTTC TAGAAAGGCG GGGTAATGGG 
CTGCTGGTCT GTGGCACCAA TGCCCGGAAG CCCAGCTGCT GGAACTTGGT GAATGACAGT 
GTGGTGATGT CACTTGGTGA GATGAAAGGC TATGCCCCCT TCAGCCCGGA TGAGAACTCC 
CTGGTTCTGT TTGAAGGAGA TGAAGTGTAC TCTACCATCC GGAAGCAGGA ATACAACGGG 
AAGATCCCTC GGTTTCGACG CATTCGGGGC GAGAGTGAAC TGTACACAAG TGATACAGTC 
ATGCAGAACC CACAGTTCAT CAAGGCCACC ATTGTGCACC AAG AC CAAGC CTATGATGAT 
AAGATCTACT ACTTCTTCCG AGAAGACAAC CCTGACAAGA ACCCCGAGGC TCCTCTCAAT 
GTGTCCCGAG TAGCCCAGTT GTGCAGGGGG GACCAGGGTG GTGAGAGTTC GTTGTCTGTC 
TCCAAGTGGA ACACCTTCCT GAAAGCCATG TTGGTCTGCA GCGATGCAGC CACCAACAGG 
AACTTCAATC GGCTGCAAGA TGTCTTCCTG CTCCCTGACC CCAGTGGCCA GTGGAGAGAT 
ACCAGGGTCT ATGGCGTTTT CTCCAACCCC TGGAACTACT CAGCTGTCTG CGTGTATTCG 
CTTGGTGACA TTGACAGAGT CTTCCGTACC TCATCGCTCA AAGGCTACCA CATGGGCCTT 
TCCAACCCTC GACCTGGCAT GTGCCTCCCA AAAAAGCAGC CCATACCCAC AGAAACCTTC 
CAGGTAGCTG ATAGTCACCC AGAGGTGGCT CAGAGGGTGG AACCTATGGG GCCCC 
(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 666 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: n/a 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: amino acid 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

Met Thr Pro Pro Pro Pro Gly Arg Ala Ala Pro Ser Ala Pro Arg Ala 
15 10 15 

Arg Val Pro Gly Pro Pro Ala Arg Leu Gly Leu Pro Leu Arg Leu Arg 
20 25 30 

Leu Leu Leu Leu Leu Trp Ala Ala Ala Ala Ser Ala Gin Gly His Leu 
35 40 45 

Arg Ser Gly Pro Arg lie Phe Ala Val Trp Lys Gly His Val Gly Gin 
50 55 60 

Asp Arg Val Asp Phe Gly Gin Thr Glu Pro His Thr Val Leu Phe His 



